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Present claim 1 relate to an extremely large number of possible 
comoounds i.e. monooxygenases. Support within the meaning of Article 6 
pa Z/or disclosure within the meaning of Article 5 PCT is to be found, 
however for only a very small proportion o" the monooxygenases claimed, 
in the present case, the claims so lack support, and the application so 
lacks disclosure, that a meaningful search over the whole of the claimed 
scope is impossible. Consequently, the search has been carried out for 
those parts of the claims which appear to be supported and disclosed, 
namely those parts relating to the monooxygenase references incorporated 
in the description i.e, P450BM-P (CYP102) described in pages 37-38. and 
43 (for the construction of chimeric P450s); styrene monooxygenase 
Pseudomonas sp. strain VLB120 (stdSc, stdR, stdA, stdB stdC and stdD 
genes) described in page 47 and 52 and corresponding to the GenBank 
aSes ion number AF031161 (for the epoxidatlon of olefins and degradation 
of methjl-substituted aromatic compounds); P450 CYP2B subfamily described 
?n page 50 (for omega-hydroxylation of fatty acids and detoxification 
Activity) P450cam (CYP2C9) described in page 54 (for the dehydrogenation 
reactions') ; P4503A described 1n page 56 (for the obtention of 
cy?lospor n : P450sca described in pages 56-57 (for^the obtention of 
oravastin)- Suae CYP105A1 and SubC CYP105B1 described in pages 58 (for 
he?MciJe resistance and bioremediation) and Pseudomonas Pf Kl^ 0US82 as 
descibed in page 86, corrsponding to GenBank accession number AB004059 
(monoxygenases acting as dioxygenases , for alkyl group monooxygenation). 

The applicant's attention is drawn to the fact that claims, or parts of 
claims relating to inventions in respect of which no international 
search'report has been established need not be the subject of an 
iSattSnal preliminary examination (Rule 66.1(e) PCT). The applicant 
is advised that the EPO policy when acting as an International 
Preliminary Examining Authority is normally not to carry out a 
preliminary examination on matter which has not been searched Th s i s 
the case irrespective of whether or not the claims are amended following 
receipt of the searcn report or during any Chapter II procedure. 
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METHOD TO SCREEN HERBICIDAL COMPOUNDS UTILIZING AIR 
SYNTHETASE FROM ARABIDOPSIS THALIANA 

The invention relates to methods for screening herbicidal compounds which inhibit 
the enzymatic activity of 5'-phosphoribosyl-5-aminoimidazole (AIR) synthetase, an enzyme 
involved in de novo purine biosynthesis. The invention also relates to the use of thereby 
identified herbicidal chemicals to control the growth of undesired vegetation. The invention 
may also be applied to the development of herbicide tolerance in plants, plant tissues, plant 
seeds, and plant cells. 

The AIR synthetase is an enzymatic step in the de novo purine biosynthesis 
pathway, which leads to the synthesis of the purine nucleotides IMP, AMP and GMP. De 
novo purine biosynthesis plays a central role in the nitrogen assimilation pathway and is 
conserved among bacteria, yeast, Drosophila and mammals (Schnorr et aL (1994) The 
Plant Journal. 6: 113-121). The AIR synthetase enzymatic activity corresponds to the fifth 
step in the pathway and catalyzes the conversion of 5'-phosphoribosyl-N- 
formylglycinamidine (FGAM) to 5'-phosphoribosyl"5-aminoimidazole (AIR). In E. co//', this 
step is carried out by a protein encoded by the purM gene. Recently, an Arabidopsis c-DNA 
encoding an enzyme having AIR synthetase activity has been cloned and its sequence has 
been determined (Senecoff and Meagher (1993) Plant Physiol. 102: 387-399; Schnorr et al. 
(1994) The Plant Journal. 6: 1 13-121). 

The use of herbicides to control undesirable vegetation such as weeds in crop fields 
has become almost a universal practice. The herbicide market exceeds 15 billion dollars 
annually. Despite this extensive use. weed control remains a significant and costly problem 
for farmers. 

Effective use of herbicides requires sound management. For instance, the time and 
method of application and stage of weed plant development are critical to getting good 
weed control with herbicides. Since various weed species are resistant to herbicides, the 
production of effective new herbicides becomes increasingly important. Novel herbicides 
can now be discovered using high-throughput screens that implement recombinant DNA 
technology. Metabolic enzymes found to be essential to plant growth and development can 
be recombinantly produced though standard molecular biological techniques and utilized as 
herbicide targets in screens for novel inhibitors of the enzymes' activity. The novel 
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inhibitors discovered through such screens may then be used as herbicides to control 

undesirable vegetation. 

Herbicides that exhibit greater potet.,^, broader weed spectrum, and more rapid 
degradation in soil can also, unfortunately, have greater crop phytotoxicity. One solution 
applied to this problem has been to develop crops that are resistant or tolerant to 
herbicides. Crop hybrids or varieties tolerant to the herbicides allowter the use of the 
herbicides to kill weeds without attendant risk of damage to the crop. Development of 
tolerance can allow application of a herbicide to a crop where its use was previously 
precluded or limited (e.g. to pre-emergence use) due to sensitivity of the crop to the 
herbicide. For example, U.S. Patent No. 4.761,373 to Anderson et ai is directed to plants 
resistant to various imidazolinone or sulfonamide herbicides. The resistance is conferred by 
an altered acetohydroxyacid synthase (AHAS) enzyme. U.S. Patent No. 4,975.374 to 
Goodman et ai relates to plant cells and plants containing a gene encoding a mutant 
glutamine synthetase (GS) resistant to inhibition by herbicides that were known to inhibit 
GS. e.g. phosphinothricin and methionine sulfoximine. U.S. Patent No. 5,013.659 to 
Bedbrook ef ai is directed to plants expressing a mutant acetolactate synthase that renders 
the plants resistant to inhibition by sulfonylurea herbicides. U.S. Patent No. 5.162.602 to 
Somers et ai discloses plants tolerant to inhibition by cyclohexanedione and 
aryloxyphenoxypropanoic acid herbicides. The tolerance is conferred by an altered acetyl 
coenzyme A carboxylase (ACCase). 

One object of the present invention is to provide methods for identifying new or 
improved herbicides. Another object of the invention is to provide methods for using such 
new or improved herbicides to suppress the growth of plants such as weeds. Still another 
object of the invention is to provide improved crop plants that are tolerant to cnch new or 
improved herbicides. 

Using an antisense validation system which allows for the inactivation of expression 
of an endogenous gene, the inventors of the present invention have demonstrated that the 
5'-phosphoribosyl-5-aminoimidazole (AIR) synthetase activity is essential in plants. This 
implies that chemicals which inhibit AIR synthetase in plants are likely to have detrimental 
effects on plants and are potentially good herbicide candidates. The present invention 
therefore provides methods of using a purified AIR synthetase to identify inhibitors thereof, 
which can then be used as herbicides to suppress the growth of undesirable vegetation. 
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e.g. in fields where crops are grown, particularly agronomically important crops such as 
maize and other cereal crops such as wheat, oats. rye. sorghum, rice, barley, millet, turf and 
forage grasses, and the like, as well as cotton, sugar cane, sugar beet, oilseed rape, and 
soybeans. 

The present invention discloses for the first time the correct nucleotide sequence of 
the Arabidopsis AIR synthetase gene. The nucleotide sequence encodingJhe pre-protein is 
set forth in SEQ ID NO:1 and the nucleotide sequence encoding the putative mature 
protein is set forth in SEQ ID NO:3. The correct amino acid sequence of the Arabidopsis 
AIR synthetase pre-protein is set forth in SEQ ID N0:2 and of the correct amino acid 
sequence of the putative mature Arabidopsis AIR synthetase is set forth in SEQ ID NO:4. 
The present invention also encompasses isolated enzymes having AIR synthetase activity 
and comprising an amino acid sequence that is identical or substantially similar to the amino 
acid sequences set forth in SEQ ID N0:2 or SEQ ID NO:4. Preferably, the amino acid 
sequence is derived from a plant. 

The present invention also encompasses an isolated nucleic acid molecule 
comprising a nucleotide sequence that encodes the amino acid sequence set forth in SEQ 
ID NO:2 or SEQ ID N0:4. Preferably, the nucleotide sequence Is SEQ ID NO:1 or SEQ ID 
NO:3. In another embodiment, the nucleotide sequence is deposited in E. coli strain 
DHSapASM designated as NRRL accession number B-21976. Also encompassed by the 
present invention are a chimeric gene comprising a heterologous promoter sequence 
operatively linked to the nucleic acid molecule of the invention; a recombinant vector 
comprising such a chimeric gene; and a host cell comprising such a chimeric gene. 
Preferably, the host cell is a bacterial cell, a yeast cell, or a plant cell. The present invention 
also encompasses a plant comprising a plant cell of the invention and seed from such a 
plant. 

In a preferred embodiment, the present invention describes a method of identifying 
chemicals having the ability to inhibit plant growth or viability, comprising:(a) combining an 
enzyme having AIR synthetase activity in a first reaction mixture with a substrate of AIR 
synthetase under conditions in which the enzyme is capable of catalyzing the synthesis of 
AIR; (b) combining the chemical to be tested and the enzyme in a second reaction mixture 
with a substrate of AIR synthetase under the same conditions and for the same period of 
time as in the first reaction mixture; (c) determining and comparing the activity of the 
enzyme in the first and second reaction mixtures; wherein less, desirably significantly less, 
enzyme activity in the second reaction mixture than in the first reaction mixture indicates 
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that the chemical of (b) has the ability to inhibit plant growth or viability. In a preferred 
embodiment, the substrate of AIR synthetase is 5'-phosphoribosyl-N-formylglycinamidine 
(FGAM) and in a further preferred embodiment, the substrate of AIR synthetase is b-FGAM. 
In another preferred embodiment, the enzyme having AIR synthetase activity is derived 
from a plant and more preferably, is encoded by a nucleotide sequence identical or 
substantially similar to the nucleotide sequence set forth in SEQ ID NO-Upr SEQ ID NO:3. 
In another embodiment, the AIR synthetase enzyme is encoded by a nucleotide sequence 
capable of encoding the amino acid sequence of SEQ ID N0:2 or SEQ ID N0:4. In yet 
another embodiment, the AIR synthetase enzyme has an amino acid sequence identical or 
substantially similar to the amino acid sequence set forth in SEQ ID N0:2 or SEQ ID NO:4. 
In another preferred embodiment, the chemical is capable of inhibiting the growth or viability 
of a plant by inhibiting the activity of AIR synthetase in the plant. In yet another prefen-ed 
embodiment, the activity of the enzyme is determined by measuring the AIR produced in the 
reaction mixture. In another preferred embodiment, the activity of the enzyme is determined 
by measuring the ADP derived from ATP in the reaction mixture. 

In another preferred embodiment, the present invention describes a method of 
identifying chemicals having the ability to inhibit plant growth or viability, comprising: (a) 
combining an enzyme having 5'-phosphoribosyl-N-formylglycinamidine (FGAM) synthetase 
activity and an enzyme having AIR synthetase activity in a first reaction mixture with a 
substrate of FGAM synthetase under conditions in which the enzymes are capable of 
catalyzing the coupled synthesis of AIR; (b) combining a chemical to be testeC and the 
enzymes in a second reaction mixture with a substrate of FGAM synthetase under the same 
conditions and the same period of time as in the first reaction mixture; and (c) determining 
and comparing the activity of the enzyme having AIR synthetase activity in the first and 
second reaction mixtures; wherein less, preferably significantly less, AIR synthetase 
enzyme activity in the second reaction mixture than in the first reaction mixture indicates 
that the chemical of (b) has the ability to inhibit plant growth or viability. In a preferred 
embodiment, the substrate of FGAM synthetase is 5'-phosphoribosyl-N-formylglycinamide 
(FGAR) and in a further preferred embodiment, the substrate of FGAM synthetase is b- 
FGAR. In another preferred embodiment, the enzyme having AIR synthetase activity is 
derived from a plant and more preferably, is encoded by a nucleotide sequence identical or 
substantially similar to the nucleotide sequence set forth in SEQ ID N0:1 or SEQ ID N0:3. 
In another embodiment, the AIR synthetase enzyme is encoded by a nucleotide sequence 
capable of encoding the amino acid sequence of SEQ ID N0:2 or SEQ ID N0:4. In yet 
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another embodiment, the AIR synthetase enzyme has an amino acid sequence identical or 
substantially similar to the amino acid sequence set forth in SEQ ID N0:2 or SEQ ID NO:4. 
In another preferred embodiment, the chemical is capable of inhibiting the growth or viability 
of a plant by inhibiting the activity of AIR synthetase in the plant. In yet another preferred 
embodiment, the activity of the enzyme is determined by measuring the AIR produced in the 
reaction mixture. In another preferred embodiment, the activity of the enzyme is determined 
by measuring the ADP derived from ATP in the reaction mixture. 

The present invention also further describes an assay comprising the steps of: (a) 
combining an enzyme having 5'-phosphoribosyl-N-formylglycinamidine (FGAf^) synthetase 
activity and an enzyme having AIR synthetase activity in a first reaction mixture with a 
substrate of FGAM synthetase under conditions in which the enzymes are capable of 
catalyzing the coupled synthesis of AIR; [O) combining a chemical and the enzymes in a 
second reaction mixture with a substrate of FGAM synthetase under the same conditions 
and for the same period of time as in the first reaction mixture; (c) determining the activity of 
the enzyme having AIR synthetase activity in the first and second reaction mixtures; 
wherein the chemical is capable of inhibiting the activity of the enzyme having AIR 
synthetase activity if the activity of the enzyme having AIR synthetase activity in the second 
reaction mixture is less, desirably significantly less, than the activity of the enzyme having 
AIR synthetase activity in the first reaction mixture. In a preferred embodiment, the 
substrate of FGAM synthetase is 5'-phosphoribosyl-N-formylglycinamide (FGAR) and in a 
further preferred embodiment, the substrate of FGAM synthetase is b-FGAR. in yet another 
prefen-ed embodiment, the activity of the enzyme is determined by measuring the AIR 
produced in the reaction mixture. In another preferred embodiment, the reaction mixture 
comprises ATP and the activity of the enzyme is determined by measuring the Al)P derived 
from ATP in the reaction mixture. 

In another preferred embodiment, the present invention describes a r.iethod for 
identifying chemicals having herbicidai activity that inhibit AIR synthetase activity in plants, 
comprising:(a) obtaining transgenic plants, plant tissue, plant seeds or plant cells 
comprising an isolated nucleotide sequence encoding an enzyme having AIR synthetase 
activity and capable of overexpressing an enzymatically active AIR synthetase; (b) applying 
a chemical to be tested to the transgenic plants, plant cells, tissues or parts and to the 
isogenic non-transformed plants, plant cells, tissues or parts; (c) determining the growth or 
viability of the transgenic and non-transformed plants, plant cells, tissues after application of 
the chemical; and (d) comparing the growth or viability of the transgenic and non- 
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transformed plants, plant cells, tissues after application of the chennical; wherein 
suppression of the growth or viability of the non-transgenic plants, plant cells, tissues or 
parts, without significantly suppressing the growth or viability of the isogenic transgenic 
plants, plant cells, tissues or parts indicates that the chemical of (b) has herbicidal activity 
that inhibits AIR synthetase activity in plants. Desirably, the chemical suppresses the 
viability or growth of the non-transgenic plants, plant cells, tissues parts, without 
significantly suppressing the growth of the viability or growth of the isogenic transgenic 
plants, plant cells, tissues or parts. In a preferred embodiment, the enzyme having AIR 
synthetase activity is encoded by a nucleotide sequence identical or substantially similar to 
the nucleotide sequence set forth in SEQ ID N0:1 or SEO ID NO:3. In another embodiment, 
the AIR synthetase enzyme is encoded by a nucleotide sequence capable of encoding the 
amino acid sequence of SEQ ID NO:2 or SEQ ID N0:4. In yet another embodiment, the AIR 
synthetase enzyme has an amino acid sequence identical or substantially similar to the 
amino acid sequence set forth in SEQ ID N0:2 or SEQ ID N0:4. 

The present invention further embodies plants, plant tissues, plant seeds, and plant 
cells that have modified AIR synthetase activity and that are therefore tolerant to inhibition 
by a herbicide at levels normally inhibitory to naturally occurring AIR synthetase activity. 
Herbicide tolerant plants encompassed by the invention include those that would othenArise 
be potential targets for normally inhibiting herbicides, particularly the agronomically 
important crops mentioned above. According to this embodiment, plants, plant tissue, plant 
seeds, or plant cells are transformed, preferably stably transformed, with a recombinant 
DNA molecule comprising a suitable promoter functional in plants operatively linked to a 
nucleotide coding sequence that encodes a modified AIR synthetase that is tolerant to 
inhibition by a herbicide at a concentration that would normally inhibit the activity of wild- 
type, unmodified AIR synthetase. Modified AIR synthetase activity may also be conferred 
upon a plant by inci easing expression of wild-type herbicide-sensitive AIR synthetase by 
providing multiple copies of wild-type AIR synthetase genes to the plant or by 
overexpression of wild-type AIR synthetase genes under control of a stronger-than-wild- 
type promoter. The transgenic plants, plant tissue, plant seeds, or plant cells thus created 
are then selected by conventional selection techniques, whereby herbicide tolerant lines are 
isolated, characterized, and developed. Alternately, random or site-specific mutagenesis 
may be used to generate herbicide tolerant lines. 

Therefore, the present invention provides a plant, plant cell, plant seed, or plant 
tissue transformed with a DNA molecule comprising a nucleotide sequence isolated from a 
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plant that encodes an enzyme having AIR synthetase activity, wherein the enzyme has AIR 
synthetase activity and wherein the DNA molecule confers upon the plant, plant cell, plant 
seed, or plant tissue tolerance to a herbicide in amounts that normally inhibits naturally 
occurring AIR synthetase activity. According to one example of this embodiment, the 
enzyme having AIR synthetase activity is encoded by a nucleotide sequence identical or 
substantially similar to the nucleotide sequence set forth in SEQ ID NO:1-or SEQ ID NO:3, 
or has an amino acid sequence identical or substantially similar to the amino acid sequence 
set forth in SEQ ID N0:2 or SEQ ID NO:4. 

The invention also provides a methoJ for suppressing the growth of a plant 
comprising the step of applying to the plant a chemical that inhibits the naturally occurring 
AIR synthetase activity in the plant. In a related aspect, the present invention is directed to a 
method for selectively suppressing the growth of weeds in a field containing a crop of 
planted crop seeds or plants, comprising the steps of: (a) planting herbicide tolerant crops 
or crop seeds, which are plants or plant seeds that are tolerant to a herbicide that inhibits 
the naturally occurring AIR synthetase activity; and (b) applying to the crops or crop seeds 
and the weeds in the field a herbicide in amounts that inhibit naturally occurring AIR 
synthetase activity, wherein the herbicide suppresses the growth of the weeds without 
significantly suppressing the growth of the crops. 

The present invention further provides a method for forming a mutagenized DNA 
molecule encoding an enzyme having AIR synthetase activity from a template DNA 
molecule encoding an enzyme having AIR synthetase activity, wherein said template DNA 
molecule has been cleaved into double-stranded-random fragments, comprising the steps 
of: (a) adding to the resultant population of double-stranded-random fragments at least one 
single-stranded or double-stranded oligonucleotide, wherein said oligonucleotide comprises 
an area of identity and an area of heterology to the template DNA molecule; (b) denaturing 
the resultant mixture of double-stranded-random fragments and oligonucleotides into single- 
stranded molecules; (c) incubating the resultant population of single-stranded molecules 
with a polymerase under conditions which result in the annealing of said single-stranded 
molecules at said areas of identity to form pairs of annealed fragments, said areas of 
identity being sufficient for one member of a pair to prime replication of the other, thereby 
forming a mutagenized double-stranded polynucleotide; (d) repeating the second and third 
steps for at least two further cycles, wherein the resultant mixture in the second step of a 
further cycle includes the mutagenized double-stranded polynucleotide from the third step 
of the previous cycle, and the further cycle forms a further mutagenized double-stranded 
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polynucleotide: wherein the mutagenlzed double-stranded polynucleotide encodes an A R 
synthetase enzyn^e having enhanced tolerance to a herbicide which inhibits the AIR 
synthetase acUvity encoded by .he template DNA n^olecule. Also provided is a 
lagenlzed DNA molecule encoding an enzyme having AIR synthetase activity obtained 
by the above method, wherein said mutagenlzed DNA molecule encodes an AIR synthetase 
enzyme having enhanced tolerance to a herbicide which inhibits the AIR synthetase act»ty 
encoded by said template DNA molecule. 

The present invention also provides a method for terming a mutagenlzed DNA 
nrolecule encoding an enzyme having AIR synthet. 3 actMW t.-om at least two non-identical 
template DNA molecules encoding enzymes having AIR symhetase actMty. comprising the 
steps ol (a) adding to the template DNA molecules at least one oligonucleotide comprising 
an area ol Identity to each of the template DNA molecule; (b) denaturing the resultant 
mixture Into single-stranded molecules; (c) incubating the resultant population ol smgle- 
stranded molecules with a polymerase under conditions which result in the annealing ol the 
Oligonucleotides to the template DNA molecules, wherein the conditions for polymenzatron 
bv the polymerase are such that polymerization products corresponding to a pomon of .he 
template DNA molecules are obtained; (d) repeating the second and third steps for at least 
two further cycles, wherein the extension products obtained in the third step are able to 
switch template DNA molecule for polymerization in the next cycle, thereby tombing a 
mutagenlzed double-slranded polynucleotide comprising sequences derived from drlferent 
template DNA molecules; wherein the mutagenlzed double-stranded polynucleotide 
encodes an AIR synll.etase enzyme having enhanced tolerance to a herbicide which 
inhibits the AIR synthetase acthrlty encoded by the template DNA molecules. Also provKled 
IS a mutagenlzed DNA molecule encoding an enzyme having AIR synthetase aCvrty 
Obtained by the above method, wherein said mutagenlzed DNA molecule encodes an AIR 
synthetase enzyme having enhanced tolerance to a herbicide which Inhibits the AIR 
synthetase activity encoded by said template DNA molecule. 

Preferably, acconling to either of the above two methods, at least one template DNA 
molecule Is derrred from a eukaryote. More preferably, said eukaryote is a plant. Still more 
preferably, said plant Is Arabidopsis (Mana. Most prelerably. said species of template 
DNA molecule is Identical or substantially similar to the SEQ ID NO;, or SEO ID N0;3. In 
another embodiment of either ol the above two methods, at least one template DNA 
molecule is derived from a prokaryote. 
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Other objects and advantages of the present invention will become apparent to 
those skilled in the art from a study of the following description of the invention and non- 
limiting examples. 

For clarity, certain terms used in the specification are defined and presented as 

follows: 

Activatable DNA Sequence: a DNA sequence that regulates the expression of 
genes in a genome, desirably the genome of a plant. The activatable DNA sequence is 
complementary to a target gene endogenous in the genome. When the activatable DNA 
sequence is introduced and expressed in a cell, it inhibits expression of the target gene. An 
activatable DNA sequence useful in conjunction with the present invention includes those 
encoding or acting as dominant inhibitors, such as a translatable or untranslatable sense 
sequence capable of disrupting gene function in stably transformed plants to positively 
identify one or more genes essential for normal growth and development of a plant. A 
preferred activatable DNA sequence is an antisense DNA sequence. The target gene 
preferably encodes a protein, such as a biosynthetic enzyme, receptor, signal transduction 
protein, structural gene product, or transport protein that is essential to the growth or 
sun/ival of the plant. In an especially preferred embodiment, the target gene encodes an 
enzyme having AIR synthetase activity. The interaction of the antisense sequence and the 
target gene results in substantial inhibition of the expression of the target gene so as to kill 
the plant, or at least inhibit normal plant growth or development. 

Activatable DNA Construct: a recombinant DNA construct comprising a synthetic 
promoter operatively linked to the activatable DNA sequence, which when introduced into a 
cell, desirably a plant cell, is not expressed, i.e. is silent, unless a complete hybrid 
transcription factor capable of binding to and activating the synthetic promoter is present. 
The activatable DNA construct is introduced into cells, tissues, or plants to form stable 
transgenic lines capable of expressing the activatable DNA sequence. 

Co-factor, natural reactant, such as an organic molecule or a metal ion, required in 
an enzyme-catalyzed reaction. A co-factor is e.g. NAD(P), riboflavin (including FAD and 
FMN), folate, molybdopterin, thiamin, biotin. lipoic acid, pantothenic acid and coenzyme A. 
S-adenosylmethionine. pyridoxal phosphate, ubiquinone, menaquinone. 

Coupled synthesis: a enzymatic biosynthesis, in which a final product is synthesized 
by two sequential enzymatic steps, wherein the substrate for the first enzymatic step is 
converted by the first enzyme to an intermediate product, which serves as a substrate for 
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the second enzymatic step and is converted by the second enzyme to the final product, 
without external addition of the intermediate product. 

DNA shuffling: DNA shuffling is a method to introduce mutations or rearrangements, 
preferably randomly, in a DNA molecule or to generate exchanges of DNA sequences 
between two or more DNA molecules, preferably randomly. The DNA molecule resulting 
from DNA shuffling is a shuffled DNA molecule that is a non-naturaHy. occurring DNA 
molecule derived from at least one template DNA molecule. The shuffled DNA encodes an 
enzyme modified with respect to the enzyme encoded by the template DNA. and preferably 
has an altered biological activity with respect to the enzyme encoded by the template DNA. 

Enzyme activity: means herein the ability of an enzyme to catalyze the conversion of 
a substrate into a product. A substrate for the enzyme comprises the natural substrate of 
the enzyme but also comprises analogues of the natural substrate which can also be 
converted by the enzyme into a product or into an analogue of a product. The acth/ity of the 
enzyme is measured for example by detemiining the amount of product in the reaction after 
a certain period of time, or by detemiining the amount of substrate remaining in the reaction 
mixture after a certain period of time. The activity of the enzyme is also measured by 
determining the amount of an unused co-factor of the reaction remaining in the reaction 
mixture after a certain period of time or by determining the amount of used co-factor .n the 
reaction mixture after a certain period of time. The activity of the enzyme is also measured 
by determining the amount of a donor of free energy or energy-rich molecule (e.g. ATP. 
phosphoenolpyruvate. acetyl phosphate or phosphocreatine) remaining in the reaction 
mixture after a certain period of time or by determining the amount of a used donor of free 
energy or energy-rich molecule (e.g. ADP. pyruvate, acetate or creatine) in the reaction 
mixture after a certain period of time. 

Hert)icide: a chemical substance used to kill or suppress the growth of plants, plant 

cells, plant seeds, or plant tissues. 

Heterologous DNA Sequence: a DNA sequence not naturally associated with a host 
cell into which it is introduced, including non-naturally occurring multiple copies of a 

naturally occurring DNA sequence. 

Homologous DNA Sequence: a DNA sequence naturally associated with a host cell 

into which it is introduced. 

Inhibitor: a chemical substance that inactivates the enzymatic activity of a protein 
such as a biosynthetic enzyme, receptor, signal transduction protein, structural gene 
product, or transport protein that is essential to the growth or sun/ival of the plant. In the 
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context of the instant invention, an inhibitor is a chemical substance that inactivates the 
enzymatic activity of AIR synthetase from a plant. The term "herbicide" is used herein to 
define an inhibitor when applied to plants, plant cells, plant seeds, or plant tissues. 

Isogenic: plants which are genetically identical, except that they may differ by the 
presence or absence of a transgene. 

Isolated: in the context of the present invention, an isolated DNA molecule or an 
isolated enzyme is a DNA molecule or enzyme that, by the hand of man. exists apart from 
its native environment and is therefore not a product of nature. An isolated DNA molecule 
or enzyme may exist in a purified form or may exist in a non-native environment such as, for 
example, a transgenic host cell. 

Mature protein: protein which is normally targeted to a cellular organelle, such as a 
chloroplast, and from which the transit peptide has been removed. 

Minimal Promoter: promoter elements, particularly a TATA element, that are inactive 
or that have greatly reduced promoter activity in the absence of upstream activation. In the 
presence of a suitable transcription factor, the minimal promoter functions to permit 
transcription. 

Modified Enzyme Activity: enzyme activity different from that which naturally occurs 
in a plant (i.e. enzyme activity that occurs naturally in the absence of direct or indirect 
manipulation of such activity by man), which is tolerant to inhibitors that inhibit the naturally 
occurring enzyme activity. 

Pre-protein: protein which is normally targeted to a cellular organelle, such as a 
chloroplast, and still comprising its transit peptide. 

Significant Increase: an increase in enzymatic activity that is larger than the margin 
of error inherent in the measurement technique, preferably an increase by about 2-fold or 
greater of the activity of the wild-type enzyme in the presence of the inhibitor, more 
preferably an increase by about 5-fold or greater, and most preferably an increase by about 
10-fold or greater. 

Significantly less: means that the amount of a product of an enzymatic reaction is 
larger than the margin of error inherent in the measurement technique, preferably a 
decrease by about 2-fold or greater of the activity of the wild-type enzyme in the absence of 
the inhibitor, more preferably an decrease by about 5-fold or greater, and most preferably 
an decrease by about 10-fold or greater. 

in its broadest sense, the term "substantially similar", when used herein with respect 
to a nucleotide sequence, means a nucleotide sequence corresponding to a reference 
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nucleotide sequence, wherein the corresponding sequence encodes a polypeptide having 
substantially the same structure and function as the polypeptide encoded by the reference 
nucleotide sequence, e.g. where only changes in amino acids not affecting the polypeptide 
function occur. Desirably the substantially similar nucleotide sequence encodes the 
polypeptide encoded by the reference nucleotide sequence. The percentage of identity 
between the substantially similar nucleotide sequence and the reference nucleotide 
sequence desirably is at least 65%. more desirably at least 75%. preferably at least 85%. 
more preferably at least 90%, still more preferably at least 95%, yet still more preferably at 
least 99%. Sequence comparisons are carried out using a Smith-Waterman sequence 
alignment algorithm (see e.g. Waterman. M.S. Introduction to Computational Biology: Maps, 
sequences and genomes. Chapman & Hall. London: 1995. ISBN 0-412-99391-0. or at 
http://www.hto.usc.edu/software/seqaln/index.html). The locals program, version 1.16. is 
used with following parameters: match: 1. mismatch penalty: 0.33. open-gap penalty: 2. 
extended-gap penalty: 2. A nucleotide sequence "substantially similar" to reference 
nucleotide sequence hybridizes to the reference nucleotide sequence in 7% sodium 
dodecyl sulfate (SDS). 0.5 M NaPO^, 1 mM EDTA at SO^C with washing in 2X SSC. 0.1% 
SDS at 50°C. more desirably in 7% sodium dodecyl sulfate (SDS). 0.5 M NaPO*. 1 mM 
EDTA at 50°C with washing in IX SSC. 0.1% SDS at SOX. more desirably still in 7% 
sodium dodecyl sulfate (SDS). 0.5 M NaPO.. 1 mM EDTA at 50=C with washing in 0.5X 
SSC, 0.1% SDS at SO-C, preferably in 7% sodium dodecyl sulfate (SDS). 0.5 M NaPO*. 1 
mM EDTA at SO^C with washing in 0.1 X SSC. 0.1% SDS at 50°C, more preferably in 7% 
sodium dodecyl sulfate (SDS). 0.5 M NaP04. 1 mM EDTA at 50'C with washing in 0.1 X 
SSC. 0.1% SDS at 65°C. 

The term "substantially similar", when used herein with respect to a protein, means a 
protein corresponding to a reference protein, wherein the protein has substantially the same 
structure and function as the reference protein, e.g. where only changes in amino acids 
sequence not affecting the polypeptide function occur. When used for a protein or an amino 
acid sequence the percentage of identity between the substantially similar and the 
reference protein or amino acid sequence desirably is at least 65%. more desirably at least 
75%. preferably at least 85%, more preferably at least 90%, still more preferably at least 
95%, yet still more preferably at least 99%. 

Substrate: a substrate is the molecule that the enzyme naturally recognizes and 
converts to a product in the biochemical pathway in which the enzyme naturally carries out 
its function, or is a modified version of the molecule, which is also recognized by the 
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enzyme and is converted by the enzynne to a product in an enzymatic reaction similar to the 
naturally-occurring reaction. 

Tolerance: the ability to continue normal growth or function when exposed to an 
inhibitor or herbicide. 

Transformation: a process for introducing heterologous DNA into a cell, tissue, or 
plant. Transformed cells, tissues, or plants are understood to encompassnot only the end 
product of a transformation process, but also transgenic progeny thereof. 

Transgenic: stably transformed with a recombinant DNA molecule that preferably 
comprises a suitable promoter operatively linked to a DNA sequence of interest. 

BRIEF DESCRIPTION OF THE SEQUENCES IN THE SEQUENCE LISTING 
DNA sequence encoding the Arabidopsis AIR synthetase pre-protein 
amino acid sequence of the Arabidopsis AIR synthetase pre-protein 
DNA sequence encoding the putative mature Arabidopsis AIR synthetase 
amino acid sequence of the putative mature Arabidopsis AIR synthetase 
oligonucleotide JG-L 
oligonucleotide AS-1 
oligonucleotide AS-2 
oligonucleotide slp242 
oligonucleotide slp244 



DEPOSIT 

The following material has been deposited with the Agricultural Research Service, 
Patent Culture Collection (NRRL), 1815 North University Street, Peoria, Illinois 61604. under 
the terms of the Budapest Treaty on the Intemational Recognition of the Deposit of 
Microorganisms for the Purposes of Patent Procedure. All restrictions on the availability of 
the deposited material will be irrevocably removed upon the granting of a patent. 
Clone Accession number Date of Deposit 

DHSapASM NRRL B-21 976 April 1 7, 1 998 

I. Correct Sequence of the Arabidopsis AIR Synthetase Gene 

The Arabidopsis AIR synthetase gene was re-sequenced by the inventors of the 
present invention and compared to a published DNA sequence for the Arabidopsis AIR 
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synthetase gene (Genbank accession L12457. Senecoff and Meagher (1993) Plant Physiol. 
102: 387-399). Sequencing results revealed a substantial error in the published DNA 
sequence, resulting in the insertion of a cytosine base at the position corresponding to 
position 1.027 in SEQ ID N0:1. This insertion leads to a frame-shift mutation in the amino 
acid sequence and therefore teaches away from the correct deduced amino acid sequence 
for the Arabidopsis AIR synthetase. The present invention discloses tor~4he first time the 
correct nucleotide sequence of the Arabidopsis AIR synthetase gene as well as the correct 
amino acid sequence of the Arabidopsis AIR synthetase. The nucleotide sequence 
encoding the pre-protein is set forth in SEQ ID N0:1 and the nucleotide sequence encoding 
the mature protein is set forth in SEQ ID N0:3. The coi-rect amino acid sequence of the 
Arabidopsis AIR synthetase pre-protein encoded by the nucleotide sequence set forth in 
SEQ ID NO:1 is set forth in SEQ ID NO:2 and the correct amino acid sequence of the 
putative mature Arabidopsis AIR synthetase encoded by the nucleotide sequence set forth 
in SEQ ID NO:2 is set forth in SEQ ID N0:4. The nucleotide sequence encoding the 
Arabidopsis AIR synthetase pre-protein was deposited in £. coli strain DHSapASM and 
designated as NRRL accession number B-21976. The present invention also encompasses 
an isolated amino acid sequence derived from a plant, wherein said amino acid sequence Is 
identical or substantially similar to the amino acid sequence encoded by the nucleotide 
sequence set forth in SEQ ID N0:1 or SEQ ID N0:3. wherein said amino acid sequence 
has 5'-phosphoribosyl-5-aminoimidazole (AIR) synthetase activity. The present invention 
also further encompasses an isolated amino acid sequence derived from a plant, wherein 
said amino acid sequence is identical or substantially similar to the amino acid sequence set 
forth in SEQ ID N0:2 or SEQ ID N0:4. wherein said amino acid sequence has 5'- 
phosphoribosyl-5-aminoimidazole (AIR) synthetase activity. 

II. Essentiality of the AIR Synthetase Gene in Plants Demonstrated by Antisense 
Inhibition 

As shown in the examples below, the essentiality of the AIR synthetase gene for 
normal plant growth and development has been demonstrated for the first time by antisense 
inhibition in plants using the antisense validation system described in PCT application no. 
EP98/07577, incorporated herein by reference. Having established the essentiality of AIR 
synthetase function in plants, the inventors thereby provide an important and sought after 
tool for new herbicide development. 
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In the system described in the present invention, a hybrid transcription factor gene is 
made that comprises a DNA-binding domain and an activation domain. In addition, an 
activatable DNA construct is made that comprises a synthetic promoter operatively linked to 
an activatable DNA sequence. The hybrid transcription factor gene and synthetic promoter 
are selected or designed such that the DNA binding domain of the hybrid transcription 
factor is capable of binding specifically to the synthetic promoter, whieh then activates 
expression of the activatable DNA sequence. A first plant is transfomned with the hybrid 
transcription factor gene, and a second plant is transformed with the activatable DNA 
construct. The first plant and second plants are crossed to produce a progeny plant 
containing both the sequence encoding the hybrid transcription factor and the synthetic 
promoter, wherein the activatable DNA sequence is expressed in the progeny plant. In the 
preferred embodiment, the activatable DNA sequence is an antisense sequence capable of 
inactivating expression of an endogenous gene such as the AIR synthetase gene. Hence, 
the progeny plant will be unable to normally express the endogenous gene. 

This antisense validation system is especially useful for allowing expression of traits 
that might othenwise be unrecoverable as constitutively driven transgenes. For instance, 
foreign genes with potentially lethal effect or antisense genes or dominant-negative 
mutations designed to abolish function of essential genes, while of great interest in basic 
studies of plant biology, present inherent experimental problems. Decreased transformation 
frequencies are often cited as evidence of lethality associated with a particular constitutively 
driven transgene, but negative results of this type are laden with altemative trivial 
explanations. The present invention is an important advancement in the field of agriculture 
because it allows stable maintenance and propagation of a test transgene separate from its 
expression. This ability to separate transgene insertion from expression is especially useful 
for firm conclusions aboV essentiality of gene ^unction to be drawn. A substantial benefit of 
the present invention is that plant genes essential for normal growth or development can 
thus be identified in this manner. The identification of such genes provide useful targets for 
screening compound libraries for identification of effective herbicides. Below, the antisense 
validation system is described in greater detail: 

A. Hybrid Transcription Factor Gene 

A hybrid transcription factor gene for use in the antisense validation system 
described herein comprises DNA sequences encoding (1) a DNA-binding domain and (2) an 
activation domain that interacts with components of transcriptional machinery assembling at 
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a promoter. Gene fragments are joined, typically such that the DNA binding domain is 
toward the 5' terminus and the activator domain is toward the 3' terminus, to fomi a hybrid 
gene whose expression produces a hybrid transcription factor. One skilled in the art is 
capable of routinely combining various DNA sequences encoding DNA binding domains 
with various DNA sequences encoding activation domains to produce a wide array of hybrid 
transcription factor genes. Examples of DNA sequences encoding DNA-binding domains 
include, but are not limited to. those encoding the DNA binding domains of GAL4, 
bacteriophage 434. lexA, lad. and phage lambda repressor. Examples of DNA sequences 
encoding the activation domain include, but are not limited to. those encoding the acidic 
activation domains of herpes simplex VP16, maize CI. and PI. In addition, suitable 
activation domains can be isolated by fusing DNA pieces from an organism of choice to a 
suitable DNA binding domain and selecting directly for function (Estruch et al.. (1994) 
Nucleic Acids Res. 22: 3983-3989). Domains of transcriptional activator proteins can be 
swapped between proteins of diverse origin (Brent and Ptashne (1985) Cell 43: 729-736). 
A preferable hybrid transcription factor gene comprises DNA sequences encoding the GAL4 
DNA binding domain fused to the maize CI activation domain. 

B. Activatable DNA Construct 

An activatable DNA construct for use in the antisense validation system described 
herein comprises (1) a synthetic promoter operatively linked to (2) an activatable DNA 
sequence. The synthetic promoter comprises at least one DNA binding site recognized by 
the DNA binding domain of the hybrid transcription factor, and a minimal promoter, 
preferably a TATA element derived from a promoter recognized by plant cells. More 
particularly the TATA element is derived from a promoter recognized by the plant cell type 
into which the synthetic promoter will be incorporated. Desirably, the DNA binding site is 
repeated multiple times in the synthetic promoter so that the minimal promoter may be more 
effectively activated, such that the activatable DNA sequence associated with the synthetic 
promoter is more effectively expressed. One skilled in the art can use routine molecular 
biology and recombinant DNA technology to make desirable synthetic promoters. 
Examples of DNA binding sites that can be used to make synthetic promoters useful in the 
invention include, but are not limited to. the upstream activating sequence (UASq) 
recognized by the GAL4 DNA binding protein, the lac operator, and the lexA binding site. 
Examples of promoter TATA elements recognized by plant cells include those derived from 
CaMV 35S, the maize Bz1 promoter, and the UBQ3 promoter. An especially preferable 
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synthetic promoter comprises a truncated CaMV 35S sequence containing the TATA 
element (nucleotides -59 to +48 relative to the start of transcription), fused at its 5' end to 
approximately 10 concatemeric direct repeats of the upstream activating sequence (UASq) 
recognized by the GAL4 DNA binding domain. 

The activatable DNA sequence encompasses any DNA sequence for which stable 
introduction and expression in a plant cell is desired. Particularly desiraibte- activatable DNA 
sequences are sense or antisense sequences, whose expression results in decreased 
expression of their endogenous counterpart genes, thereby inhibiting normal plant growth or 
development. The activatable DNA sequence is operatively linked to the synthetic promoter 
to form the activatable DNA construct. The activatable DNA sequence in the activatable 
DNA construct is not expressed, i.e. is silent, in transgenic lines, unless a hybrid 
transcription factor capable of binding to and activating the synthetic promoter, is also 
present. The activatable DNA construct subsequently is introduced into cells, tissues or 
plants to form stable transgenic lines expressing the activatable DNA sequence, as 
described more fully below. In the context of the present invention, the activatable DNA 
sequence preferably comprises an antisense AIR synthetase sequence. 

C. Transgenic Plants Containing the Hybrid Transcription Factor Gene or the 
Activatable DNA Construct 

The antisense validation system described herein utilizes a first plant containing the 
hybrid transcription factor gene and a second plant containing the activatable DNA 
construct. The hybrid transcription factor genes and activatable DNA constructs described 
above are introduced into the plants by methods well known and routinely used in the art, 
including but not limited to crossing, Agrobacterium-medlated transformation, Ti plasmid 
vectors, direct DNA uptake such as microprojectile bombardment, liposome mediated 
uptake, micro-injection, etc. Transformants are screened for the presence and functionality 
of the transgenes according to standard methods known to those skilled in the art, 

D. Transgenic Plants Containing Both the Hybrid Transcription Factor Gene and 
the Activatable DNA Construct 

F1 plants containing both the hybrid transcription factor gene and the activatable 
DNA construct are generated by cross-pollination and selected for the presence of an 
appropriate mart<er. In contrast to plants containing the activatable DNA construct alone. 
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the F1 plants generate high levels of activatable DNA sequence expression product, 
comparable to those obtained with strong constitutive promoters such as CaMV 35S. 

E. Antisense Validation Assay 

Thus, a useful assay in the system described herein comprises the following steps: 
providing a first transgenic plant stably transfomied with a hybrid transcription factor 
gene encoding a hybrid transcription factor capable of activating a synthetic promoter when 
said synthetic promoter is present in the plant, wherein the first transgenic plant is 
homozygous for the hybrid transcription factor; b) providing a second transgenic plant stably 
transformed with an activatable DNA construct comprising a synthetic promoter activatable 
by the hybrid transcription factor of step a) operatively linked to an activatable DNA 
sequence, such as an antisense AIR synthetase sequence; c) crossing the first transgenic 
plant with the second transgenic plant to yield F1 plants expressing the activatable DNA 
sequence in the presence of the hybrid transcription factor; and d) detemiining the effect of 
expression of the activatable DNA sequence on the F1 plants. 

111. Recombinant Production of AIR Synthetases and Uses Thereof 

For recombinant production of AIR synthetase in a host organism, a nucleotide 
sequence encoding an enzyme having AIR synthetase is inserted into an expression 
cassette designed for the chosen host and introduced into the host where it is 
recombinantly produced. The choice of specific regulatory sequences such as promoter, 
signal sequence. 5* and 3' untranslated sequences, and enhancer appropriate for the 
chosen host is within the level of skill of the routineer in the art. The resultant molecule, 
containing the individual elements linked in proper reading frame, may be inserted into a 
vector capable of being transformed into the host cell. Suitable expression vectors and 
methods for recombinant production of proteins are well known for host organisms such as 
E. coli. yeast, and insect cells (see. e.g., Luckow and Summers. B/o/T echnol. 6: 47 (1 988)). 
Specific examples include plasmids such as pBluescript (Stratagene. La Jolla, CA). pFLAG 
(Intemational Biotechnologies. Inc.. New Haven. CT). pTrcHis (Invitrogen. La Jolla. CA). and 
baculovirus expression vectors, e.g.. those derived from the genome of Autographica 
califomica nuclear polyhedrosis virus (AcMNPV). A preferred baculovirusA.nsect system is 
pVn 1392/Sf21 cells (Invitrogen, La Jolla. CA). 

m a preferred embodiment, the nucleotide sequence encoding an enzyme having a 
AIR synthetase activity is derived from an eukaryote. such as a mammal, a fly or a yeast, 
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but is preferably derived from a plant. In a further preferred embodiment, the nucleotide 
sequence is identical or substantially similar to the nucleotide sequence set forth in SEQ ID 
N0:1 or SEQ ID NO:3, or encodes an enzyme having AIR synthetase activity, whose amino 
acid sequence is identical or substantially similar to the amino acid sequence set forth in 
SEQ ID NO:2 or SEQ ID NO:4, The nucleotide sequence set forth in SEQ ID N0:1 encodes 
the Arabidopsis AIR synthetase pre-protein. whose amino acid sequence Is set forth in SEQ 
ID N0:2. and the nucleotide sequence set forth in SEQ ID N0:3 encodes the Arabidopsis 
putative mature ^ IR synthetase, whose amino acid sequence is set forth in SEQ ID N0:4. in 
another preferred embodiment, the nucleotide sequence is derived from a prokaryote, 
preferably a bacteria, e.g. E. colL In this case, the enzyme having AIR synthetase activity is 
encoded by the purM gene. 

Recombinantly produced AIR synthetases are isolated and purified using a variety 
of standard techniques. The actual techniques that may be used will vary depending upon 
the host organism used, whether the enzyme is designed for secretion, and other such 
factors familiar to the skilled artisan (see, e.g. chapter 16 of Ausubel, F. et al., "Current 
Protocols in Molecular Biology", pub. by John Wiley & Sons, Inc. (1994). 

Recombinantly produced AIR synthetases are useful for a variety of purposes. For 
example, they can be used in in vitro assays to screen known herbicidal chemicals whose 
target has not been identified to determine if they inhibit AIR synthetases. Such in vitro 
assays may also be used as more general screens to identify chemicals that inhibit such 
enzymatic activity and that are therefore novel herbicide candidates. Altematively. 
recombinantly produced AIR synthetases may be used to elucidate the complex stnjcttTe of 
these molecules and to further characterize their association with known inhibitors in order 
to rationally design new inhibitory herbicides as well as herbicide tolerant forms of the 
enzymes. 

IV. In Vitro Inhibitor Assay 

An in vitro assay useful for identifying inhibitors of enzymes encoded by essential 
plant genes, such as the AIR synthetase, preferably comprises the steps of: a) reacting an 
enzyme having AIR synthetase activity and a substrate thereof in the presence of a 
suspected inhibitor of the enzyme's function; b) comparing the rate of enzymatic activity in 
the presence of the suspected inhibitor to the rate of enzymatic activity under the same 
conditions in the absence of the suspected inhibitor; and c) determining whether the 
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suspected inhibitor inhibits the AIR synthetase enzymatic activity. In a preferred 
embodiment, such a determination is made by comparing, in the presence and absence of 
the candidate inhibitor, the amount of AIR synthesized in the in vitro assay using 
fluorescence or absorbance detection. In another preferred embodiment, such a 
determination is made by comparing, in the presence and absence of the candidate 
inhibitor, the amount of ADP formed in the in vitro assay using fluoresciBnGe or absorbance 
detection. A preferred substrate for AIR synthetase is 5'-phosphoribosyi-N- 
formylglycinamidine (FGAI^), in particular the b isomer, b-FGAM. 

In another preferred embodiment, a coupled FGAM synthetase/AIR synthetase 
assay is used, thereby increasing the detection limit of the assay and resulting in an 
improved screening procedure for a chemical inhibiting AIR synthetase activity. Such a 
coupling assay preferably comprises the steps of: a) reacting an enzyme having 5'- 
phosphoribosyl-N-formylglycinamidine (FGAM) synthetase activity, an enzyme having AIR 
synthetase activity and a substrate of FGAM synthetase in the presence of a suspected 
inhibitor of the enzyme's function; b) comparing the rate of enzymatic activity in the 
presence of the suspected inhibitor to the rate of enzymatic activity under the same 
conditions in the absence of the suspected inhibitor; and c) determining whether the 
suspected inhibitor inhibits the AIR synthetase enzymatic activity, in a preferred 
embodiment, such a determination is made by comparing, in the presence and absence of 
the candidate inhibitor, the amount of AIR synthesized in the in vitro assay using 
fluorescence or absorbance detection. In another preferred embodiment, such a 
determination is made by comparing, in the presence and absence of the candidate 
inhibitor, the amount of ADP formed in the in vitro assay using fluorescence or absorbance 
detection. A preferred substrate for FGAM synthetase is 5'-phosphoribosyl-N- 
formylglycinamide (FGAR), in particular the b isomer, b-FGAR. In a further preferred 
embodiment, the enzyme having FGAM synthetase activity is derived from a bacteria, and 
is preferably the E. coli FGAM synthetase encoded by the purL gene. The purL gene is 
preferably recombinantly produced in E coli. While any suitable AIR synthetase may be 
used, preferably the AIR synthetase used in such in vitro assays is derived from a plant. In 
another preferred embodiment, an assay coupling more than one enzymatic activity 
preceding AIR synthetase in the purine biosynthesis pathway is used. 

In a preferred embodiment, an enzyme used in an in vitro assay is derived from cells 
comprising the enzyme, preferably, from a crude extract of the cells. The enzyme is 
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preferably isolated and purified from the cells or from the crude extract. The enzyme is 
preferably produced recombinantly and is preferably isolated and purified prior to be used in 
the assay. Chemicals identified in an in vitro assay are then tested for their ability to inhibit 
plant growth or viability. 

V. In Vivo Inhibitor Assay 

A. In one embodiment, a suspected herbicide, for example identified by in vitro 
screening, is applied to plants at various concentrations. The suspected herbicide is 
preferably sprayed on the plants. After application of the suspected herbicide, its effect on 
the plants, for example death or suppression of growth is recorded. 

B. In another embodiment, an in vivo screening assay for inhibitors of the AIR 
synthetase activity uses transgenic plants, plant tissue, plant seeds or plant cells capable of 
overexpressing a nucleotide sequence having AIR synthetase activity, wherein the AIR 
synthetase is enzymatically active in the transgenic plants, plant tissue, plant seeds or plant 
cells. The nucleotide sequence is preferably derived from an eukaryote, such as a mammal, 
a fly or a yeast, but is preferably derived from a plant. In a further preferred embodiment, 
the nucleotide sequence is identical or substantially similar to the nucleotide sequence set 
forth in SEQ ID NO:1 or SEQ ID NO:3, or encodes an enzyme having AIR synthetase 
activity, whose amino acid sequence is identical or substantially similar to the amino acid 
sequence set forth in SFQ ID N0:2 or SEQ ID N0:4. In another preferred embodiment, the 
nucleotide sequence is derived from a prokaryote, preferably a bacteria, e.g. E. co//. In this 
case, the enzyme having AIR synthetase activity is encoded by the purM gene. 

A chemical is then applied to the transgenic plants, plant tissue, plant ^oeds or plant 
cells and to the isogenic non-transformed plants, plant tissue, plant seeds or plant cells, and 
the growth or viability of the transgenic and non-transformed plants, plant tissue, plant 
seeds or plant cells are determined after application of the chemical and compared. 

VI. Herbicide Tolerant Plants 

The present invention is further directed to plants, plant tissue, plant seeds, and 
plant cells tolerant to herbicides that inhibit the naturally occurring AIR synthetase activity in 
these plants, wherein the tolerance is conferred by an altered AIR synthetase activity. 
Altered AIR synthetase activity may be conferred upon a plant according to the invention by 
increasing expression of wild-type herbicide-sensitive AIR synthetase by providing 
additional wild-type AIR synthetase genes to the plant, by expressing modified 
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herbicide-tolerant AIR synthetases in the plant, or by a combination of these techniques. 
Representative plants include any plants to which these herbicides are applied for their 
normally intended purpose. Preferred are agronomically important crops such as cotton, 
soybean, oilseed rape, sugar beet, maize, rice, wheat, barley, oats. rye. sorghum, millet, 
turf, forage, turf grasses, and the like. 

A. Increased Expression of Wild-Type AIR Synthetase 

Achieving altered AIR synthetase activity through increased expression results in a 
level of a AIR synthetase in the plant cell at least sufficient to overcome growth inhibition 
caused by the herbicide. The level of expressed enzyme generally is at least two times, 
preferably at least five times, and more preferably at least ten times the natively expressed 
amount. Increased expression may be due to multiple copies of a wild-type AIR synthetase 
gene; multiple occurrences of the coding sequence within the gene (i.e. gene amplification) 
or a mutation in the non-coding, regulatory sequence of the endogenous gene in the plant 
cell. Plants having such altered gene activity can be obtained by direct selection in plants 
by methods known in the art (see, e.g. U.S. Patent No. 5.162.602. and U.S. Patent No. 
4.761.373. and references cited therein). These plants also may be obtained by genetic 
engineering techniques known in the art. Increased expression of a herbicide-sensitive AIR 
synthetase gene can also be accomplished by transfonning a plant cell with a recombinant 
or chimeric DNA molecule comprising a promoter capable of driving expression of an 
associated structural gene in a plant cell operatively linked to a homologous or heterologous 
structural gene encoding the AIR synthetase. Preferably, the transformation is stable, 
thereby providing a heritable transgenic trait. 

B. Expression of Modified Herbiciue-Tolerant AIR Synthetases 
According to this embodiment, plants, plant tissue, plant seeds, or plant cells are 
stably transformed with a recombinant DNA molecule comprising a suitable promoter 
functional in plants operatively linked to a coding sequence encoding a herbicide tolerant 
form of an AIR synthetase. A herbicide tolerant form of the enzyme has at least one amino 
acid substitution, addition or deletion that confers tolerance to a herbicide that inhibits the 
unmodified, naturally occurring form of the enzyme. The transgenic plants, plant tissue, 
plant seeds, or plant cells thus created are then selected by conventional selection 
techniques, whereby herbicide tolerant lines are isolated, characterized, and developed. 
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Below are described methods for obtaining genes that encode herbicide tolerant forms of 
AIR synthetases: 

One general strategy involves direct or indirect mutagenesis procedures on 
microbes. For instance, a genetically manipulatable microbe such as £ co//or S. cerevisiae 
may be subjected to random mutagenesis in vivo with mutagens such as UV light or ethyl or 
methyl methane sulfonate. Mutagenesis procedures are described, for example, in Miller, 
Experiments in Molecular Genetics, Cold Spring Harbor Laboratory, Cold Spring Harbor. NY 
(1972); Davis et al., Advanced Bacterial Genetics, Cold Spring Harbor Laboratory, Cold 
Spring Harbor, NY (1980); Sherman et aL, Methods in Yeast Genetics, Cold Spring Harbor 
Laboratory, Cold Spring Harbor, NY (1983); and U.S. Patent No. 4.975,374. The microbe 
selected for mutagenesis contains a normal, inhibitor-sensitive AIR synthetase gene and is 
dependent upon the activity conferred by this gene. The mutagenized cells are grown in 
the presence of the inhibitor at concentrations that inhibit the unmodified gene. Colonies of 
the mutagenized microbe that grow better than the unmutagenized microbe in the presence 
of the inhibitor (i.e. exhibit resistance to the inhibitor) are selected for further analysis. AIR 
synthetase genes from these colonies are isolated, either by cloning or by PCR 
amplification, and their sequences are elucidated. Sequences encoding altered gene 
products are then cloned back into the microbe to confirm their ability to confer inhibitor 
tolerance. 

A method of obtaining mutant herbicide-tolerant alleles of a plant AIR synthetase 
gene involves direct selection in plants. For example, the effect of a mutagenized AIR 
synthetase gene on the growth inhibition of plants such as Arabidopsis, soybean, or maize 
is determined by plating seeds sterilized by art-recognized methods on plates on a simple 
minimal salts medium containing increasing concentrations of the inhibitor. Such 
concentrations are in the range of 0.001, 0.003, 0.01, 0.03. 0.1, 0.3. 1, 3, 10, 30, 110, 300, 
1000 and 3000 parts per million (ppm). The lowest dose at which significant growth 
inhibition can be reproducibly detected is used for subsequent experiments. Determination 
of the lowest dose is routine in the art. 

Mutagenesis of plant material is utilized to increase the frequency at which resistant 
alleles occur in the selected population. Mutagenized seed material is derived from a 
variety of sources, including chemical or physical mutagenesis or seeds, or chemical or 
physical mutagenesis or pollen (Neuffer, In Maize for Biological Research Sheridan, ed. 
Univ. Press, Grand Forks, ND., pp. 61-64 (1982)), which is then used to fertilize plants and 
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the resulting Mi mutant seeds collected. Typically for Arabidopsis, M2 seeds (Lehle Seeds. 
Tucson. AZ), which are progeny seeds of plants grown from seeds mutagenized with 
chemicals, such as ethyl methane sulfonate, or with physical agents, such as gamma rays 
or fast neutrons, are plated at densities of up to 10,000 seeds/plate (10 cm diameter) on 
minimal salts medium containing an appropriate concentration of inhibitor to select for 
tolerance. Seedlings that continue to grow and remain green 7-21 day& after plating are 
transplanted to soil and grown to maturity and seed set. Progeny of these seeds are tested 
for tolerance to the herbicide. If the tolerance trait is dominant, plants whose seed 
segregate 3:1 / resistant:sensitive are presumed to have been heterozygous for the 
resistance at the M2 generation. Plants that give rise to all resistant seed are presumed to 
have been homozygous for the resistance at the M2 generation. Such mutagenesis on 
intact seeds and screening of their M2 progeny seed can also be carried out on other 
species, for instance soybean (see, e.g. U.S. Pat. No. 5,084,082). Alternatively, mutant 
seeds to be screened for herbicide tolerance are obtained as a result of fertilization with 
pollen mutagenized by chemical or physical means. 

Confirmation that the genetic basis of the herbicide tolerance is a modified AIR 
synthetase gene is ascertained as exemplified below. First, alleles of the AIR synthetase 
gene from plants exhibiting resistance to the inhibitor are isolated using PGR with primers 
based either upon the Arabidopsis cDNA coding sequences shown in SEQ ID N0:1 or. 
more preferably, based upon the unaltered AIR synthetase gene sequence from the plant 
used to generate tolerant alleles. After sequencing the alleles to determine the presence of 
mutations in the coding sequence, the alleles are tested for their ability to confer tolerance 
to the inhibitor on plants into which the putative tolerance-conferring alleles have been 
transformed. These plants can be either Arabidopsis plants or any other plant whose 
growth is susceptible I. the AIR synthetase inhibitors. Second, the inserted AIR synthetase 
genes are mapped relative to known restriction fragment length polymorphisms (RFLPs) 
(See, for example, Chang et al. Proc. Natl. Acad. Sci. USA 85: 6856-6860 (1988); Nam et 
al.. Plant Cell 1: 699-705 (1989). The AIR synthetase inhibitor tolerance trait is 
independently mapped using the same markers. When tolerance is due to a mutation in 
that AIR synthetase gene, the tolerance trait maps to a position indistinguishable from the 
position of the AIR synthetase gene. 

Another method of obtaining herbicide-tolerant alleles of a AIR synthetase gene is 
by selection in plant cell cultures. Explants of plant tissue, e.g. embryos, leaf disks, etc. or 
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actively growing callus or suspension cultures of a plant of interest are grown on medium in 
the presence of increasing concentrations of the Inhibitory herbicide or an analogous 
inhibitor suitable for use in a laboratory environment. Varying degrees of growth are 
recorded in different cultures. In certain cultures, fast-growing variant colonies arise that 
continue to grow even in the presence of normally inhibitory concentrations of inhibitor. The 
frequency with which such faster-growing variants occur can be increaseckby treatment with 
a chemical or physical mutagen before exposing the tissues or cells to the inhibitor. Putative 
tolerance-conferring alleles of the AIR synthetase gene are isolated and tested as 
described in the foregoing paragraphs. Those alleles identified as conferring herbicide 
tolerance may then be engineered for optimal expression and transformed into the plant. 
Alternatively, plants can be regenerated from the tissue or cell cultures containing these 
alleles. 

Still another method involves mutagenesis of wild-type, herbicide sensitive plant AIR 
synthetase genes in bacteria or yeast, followed by culturing the microbe on medium that 
contains inhibitory concentrations of the inhibitor and then selecting those colonies that 
grow in the presence of the inhibitor. More specifically, a plant cDNA. such as the 
Arabidopsis cDNA encoding the AIR synthetase is cloned into a microbe that otherwise 
lacks the selected gene's activity. The transformed microbe is then subjected to in vivo 
mutagenesis or to in vitro mutagenesis by any of several chemical or enzymatic methods 
known in the art. e.g. sodium bisulfite (Shortle et a!., Methods EnzymoL ^00:457-468 
(1983); methoxylamine (Kadonaga et al„ Nucleic Acids Res. 73:1733-1745 (1985); 
oligonucleotide-directed saturation mutagenesis (Hutchinson et aL, Proc. Natl. Acad. Sci, 
USA, 83:710-714 (1986); or various polymerase misincorporation strategies (see, e.g. 
Shortle et al.. Proc. Natl. Acad. Sci. USA. 79:1588-1592 (1982); Shiraishi et ai, Gene 
64.-313-319 (1988); and Leung et al, Technique 7:11-15 (1989). Colonies that grow in the 
presence of normally inhibitory concentrations of inhibitor are picked and purified by 
repeated restreaking. Their plasmids are purified and tested for the ability to confer 
tolerance to the inhibitor by retransforming them into the microbe lacking AIR synthetase 
gene activity. The DNA sequences of cDNA inserts from plasmids that pass this test are 
then determined. 

Herbicide resistant AIR synthetase enzymes are also obtained using methods 
involving in vitro recombination, also called DNA shuffling. By DNA shuffling, mutations, 
preferably random mutations, are introduced in AIR synthetase genes. DNA shuffling also 
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leads to the recombination and rearrangement of sequences within an AIR synthetase 
genes or to recombination and exchange of sequences between two or more different of 
AIR synthetase genes. These methods allows for the production of millions of mutated AIR 
synthetase genes. The mutated genes, or shuffled genes, are screened for desirable 
properties, e.g. improved tolerance to herbicides and for mutations that provide broad 
spectrum tolerance to the different classes of inhibitor chemistry. SuCh-screens are well 
within the skills of a routineer in the art. 

In a preferred embodiment, a mutagenized AIR synthetase gene is formed from at 
least one template AIR synthetase gene, wherein the template AIR synthetase gene has 
been cleaved into double-stranded random fragments of a desired size, and comprising the 
steps of adding to the resultant population of double-stranded random fragments one or 
more single or double-stranded oligonucleotides, wherein said oligonucleotides comprise an 
area of identity and an area of heterology to the double-stranded random fragments; 
denaturing the resultant mixture of double-stranded random fragments and oligonucleotides 
into single-stranded fragments; incubating the resultant population of single-stranded 
fragments with a polymerase under conditions which result in the annealing of said single- 
stranded fragments at said areas of identity to form pairs of annealed fragments, said areas 
of identity being sufficient for one member of a pair to prime replication of the other, thereby 
forming a mutagenized double-stranded polynucleotide; and repeating the second and third 
steps for at least twn further cycles, wherein the resultant mixture in the second step of a 
further cycle includes the mutagenized double-stranded polynucleotide from the third step 
of the previous cycle, and the further cycle forms a further mutagenized double-stranded 
polynucleotide, wherein the mutagenized polynucleotide is a mutated AIR synthetase gene 
having enhanced tolerance to a herbicide which inhibits naturally occurring AIR synthetase 
activity. In a preferred embodiment, th: concentration of a single species of double- 
stranded random fragment in the population of double-stranded random fragments is less 
than 1% by weight of the total DNA. In a further preferred embodiment, the template 
double-stranded polynucleotide comprises at least about 100 species of polynucleotides. In 
another preferred embodiment, the size of the double-stranded random fragments is from 
about 5 bp to 5 kb. In a further preferred embodiment, the fourth step of the method 
comprises repeating the second and the third steps for at least 10 cycles. Such method is 
described e.g. in Stemmer et al. (1994) Nature 370: 389-391. in US Patent 5.605.793 and in 
Crameri et al. (1998) Nature 391: 288-291. as well as in WO 97/20078. and these 
references are incorporated herein by reference. 
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In another preferred embodiment, any combination of two or more different AIR 
synthetase genes are mutagenized in vitro by a staggered extension process (StEP), as 
described e.g. in Zhao et al. (1998) Nature Biotechnology 16: 258-261. The two or more 
AIR synthetase genes are used as template for PGR amplification with the extension cycles 
of the PGR reaction preferably carried out at a lower temperature than the optimal 
polymerization temperature of the polymerase. For example, when-- a thermostable 
polymerase with an optimal temperature of approximately 72*'C is used, the temperature for 
the extension reaction is desirably below 72°G. more desirably below 65°C. preferably 
below 60°G, more preferably the temperature for the extension reaction is 55**C. 
Additionally, the duration of the extension reaction of the PGR cycles is desirably shorter 
than usually carried out in the art, more desirably it is less than 30 seconds, preferably it is 
less than 15 seconds, more preferably the duration of the extension reaction is 5 seconds. 
Only a short DNA fragment is polymerized in each extension reaction, allowing template 
switch of the extension products between the starting DNA molecules after each cycle of 
denaturation and annealing, thereby generating diversity among the extension products. 
The optimal number of cycles in the PGR reaction depends on the length of the AIR 
synthetase coding regions to be mutagenized but desirably over 40 cycles, more desirably 
over 60 cycles, preferably over 80 cycles are used. Optimal extension conditions and the 
optimal number of PGR cycles for every combination of AIR synthetase genes are 
determined as described in using procedures well-known in the art. The other parameters 
for the PGR reaction are essentially the same as commonly used in the art. The primers for 
the amplification reaction are preferably designed to anneal to DNA sequences located 
outside of the coding sequence of the AIR synthetase genes, e.g. to DNA sequences of a 
vector comprising the AIR synthetase genes, whereby the different AIR synthetase genes 
used in the PGR reaction are preferably comprised in separate vectors. The primers 
desirably anneal to sequences located less than 500 bp away from the AIR synthetase 
coding sequences, preferably less than 200 bp away from the AIR synthetase coding 
sequences, more preferably less than 120 bp away from the AIR synthetase coding 
sequences. Preferably, the AIR synthetase coding sequences are surrounded by restriction 
sites, which are included in the DNA sequence amplified during the PGR reaction, thereby 
facilitating the cloning of the amplified products into a suitable vector. 

In another preferred embodiment, fragments of AIR synthetase genes having 
cohesive ends are produced as described in WO 98/05765. The cohesive ends are 
produced by ligating a first oligonucleotide corresponding to a part of a AIR synthetase 
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gene to a second oligonucleotide not present in the gene or corresponding to a part of the 
gene not adjoining to the part of the gene corresponding to the first oligonucleotide, wherein 
the second oligonucleotide contains at least one ribonucleotide. A double-stranded DNA is 
produced using the first oligonucleotide as template and the second oligonucleotide as 
primer. The ribonucleotide is cleaved and removed. The nucleotide(s) located 5' to the 
ribonucleotide is also removed, resulting in double-stranded fragments-having cohesive 
ends. Such fragments are randomly reassembled by ligation to obtain novel combinations 
of gene sequences. 

Any AIR synthetase gene or any combination of AIR synthetase genes is used for in 
vitro recombination in the context of the present invention, for example, an AIR synthetase 
gene derived from a plant, such as. e.g. Arabidopsis thaliana, e.g. an AIR synthetase gene 
set forth in SEQ ID N0:1 or SEQ ID N0:3. an AIR synthetase gene from a bacteria, such as 
Bacillus subtilis (Ebbole and Zalkin (1987) J. Biol. Chem. 262: 8274-8287) or E. coli (Smith 
and Daum (1986) J. Biol. Chem. 261: 10632-10637). a human AIR synthetase gene (Aimi et 
al. (1990) Nucleic Acids Res. 18: 6665-6672). or an AIR synthetase gene from Drosophila 
(Henikoff et al. (1986) PNAS 289: 33-37). from chicken (Chen et al. (1990) PNAS 87: 3097- 
3101). and all incorporated herein by reference. Whole AIR synthetase genes or portions 
thereof are used in the context of the present invention. The library of mutated AIR 
synthetase genes obtained by the methods described above are cloned into appropriate 
expression vectors and the resulting vectors are transformed into an appropriate host, for 
example an algae like Chlamydomonas. a yeast or a bacteria. An appropriate host is 
preferably a host that othenwise lacks AIR synthetase gene activity, for example E. coli 
strain S066O9/IKC (Schnorr et al, (1994) Plant Journal 6: 1 13-121). Host cells transfomned 
with the vectors comprising the library of mutated AIR synthetase genes are cultured on 
medium that contains inhibitory concentrations of the inhibitor and those colonies that grow 
in the presence of the inhibitor are selected. Colonies that grow in the presence of normally 
inhibitory concentrations of inhibitor are picked and purified by repeated restreaking. Their 
plasmids are purified and the DNA sequences of cDNA inserts from plasmids that pass this 

test are then determined. 

An assay for identifying a modified AIR synthetase gene that is tolerant to an 
inhibitor may be performed in the same manner as the assay to identify inhibitors of the AIR 
synthetase activity (Innibitor Assay, above) with the following modifications: First, a mutant 
AIR synthetase is substituted in one of the reaction mixtures for the wild-type AIR 
synthetase of the inhibitor assay. Second, an inhibitor of wild-type enzyme is present in 
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both reaction mixtures. Third, mutated activity {activity in the presence of inhibitor and 
mutated enzyme) and unmutated activity (activity in the presence of inhibitor and wild-type 
enzyme) are compared to determine whether a significant increase in enzymatic activity is 
observed in the mutated activity when compared to the unmutated activity. Mutated activity 
is any measure of activity of the mutated enzyme while in the presence of a suitable 
substrate and the inhibitor. Unmutated activity is any measure of activity of the wild-type 
enzyme while in the presence of a suitable substrate and the inhibitor. A significant 
increase is defined as an increase in enzymatic, activity that is larger than the margin of 
en-or inherent in the measurement technique, preferably an increase by about 2-fold or 
greater of the activity of the wild-type enzyrre in the presence of the inhibitor, more 
preferably an increase by about 5-fold or greater, most preferably an increase by about 
10-fold or greater. 

In addition to being used to create herbicide-tolerant plants, genes encoding 
herbicide tolerant AIR synthetases can also be used as selectable markers in plant cell 
transformation methods. For example, plants, plant tissue, plant seeds, or plant cells 
transfonmed with a transgene can also be transformed with a gene encoding an altered AIR 
synthetase capable of being expressed by the plant. The transformed cells are transferred 
to medium containing an inhibitor of the enzyme in an amount sufficient to inhibit the 
survivability of plant cells not expressing the modified gene, wherein only the transfomied 
cells will sun/ive. The method is applicable to any plant cell capable of being transformed 
with a modified AIR synthetase-encoding gene, and can be used with any transgene of 
interest. Expression of the transgene and the modified gene can be driven by the same 
promoter functional in plant cells, or by separate promoters. 

VII. Plant Transformation Technology 

A wild-type or herbicide-tolerant form of the AIR synthetase gene can be 
incorporated in plant or bacterial cells using conventional recombinant DNA technology. 
Generally, this involves inserting a DNA molecule encoding the AIR synthetase into an 
expression system to which the DNA molecule is heterologous (i.e., not normally present) 
using standard cloning procedures known in the art. The vector contains the necessary 
elements for the transcription and translation of the inserted protein-coding sequences in a 
host cell containing the vector. A large number of vector systems known in the art can be 
used, such as plasmids, bacteriophage viruses and other modified viruses. The 
components of the expression system may also be modified to increase expression. For 
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example truncated sequences, nucleotide substitutions or other modifications may be 
employed Expression systems known in the art can be used to transform virtually any crop 
plant ceil under suitable conditions. A transgene comprising a wild-type or herbicde- 
tolerant form of the AIR synthetase gene is preferably stably transformed and integrated 
into the genome of the host cells. In another preferred embodiment, the transgene 
comprising a wild-type or herbicide-tolerant form of the AIR synthetase gene located on a 
self-replicating vector. Examples of self-replicating vectors are viruses, in particular gem.n. 
viruses. Iran, 'xmed cells can be regenerated into whole plants such that the chosen form 
of the AIR synthetase gene confers herbicide tolerance in the transgenic plants. 

A. Requirements for Construction of Plant Expression Cassettes 
Gene sequences intended for expression in transgenic plants are first assembled in 
expression cassettes behind a suitable promoter expressible in plants. The expression 
cassettes may also comprise any further sequences required or selected for the expression 
of the transgene. Such sequences include, but are not restricted to. transcription 
terminators, extraneous sequences to enhance expression such as introns. vital sequences, 
and sequences intended for the targeting of the gene product to specific organelles and cell 
compartments. These expression cassettes can then be easily transferred to the plant 
transformation vectors described infra. The following is a description of vanous 
components of typical expression cassettes. 

1 . Promoters 

The selection of the promoter used in expression cassettes will determine the spatial 
and temporal expression pattern of the transgene in the transgenic plant. Selected 
promoters will express i.aasgenes in specific cell types (such as leaf epidem^al cells, 
mesophyll cells, root cortex cells) or in specific tissues or organs (roots, leaves or flowers, 
for example) and the selection will reflect the desired location of accumulation of the gene 
product. Alternatively, the selected promoter may drive expression of the gene under 
various inducing conditions. Promoters vary in their strength, i.e.. ability to promote 
transcription. Depending upon the host cell system utilized, any one of a number of suitable 
promoters known in the art can be used. For example, for constitutive expression, the 
CalVlV 35S promoter, the rice actin promoter, or the ubiquitin promoter may be used. For 
regulatable expression, the chemically inducible PR-1 promoter from tobacco or Arabidopsis 
may be used (see. e.g.. U.S. Patent No. 5.689.044). 
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2. Transcriptional Terminators 

A variety of transcriptional terminators are available for use in expression cassettes. 
These are responsible for the temriination of transcription beyond the transgene and its 
correct polyadenylation. Appropriate transcriptional terminators are those that are known to 
function in plants and include the CaMV 35S terminator, the tml tenriinator. the nopaline 
synthase terminator and the pea rbcS E9 terminator. These can be used in both 
monocotyledonous and dicotyledonous plants. 

3. Sequences for the Enhancement or Regulation of Expression 

Numerous sequences have been found to enhance gene expression from within the 
transcriptional unit and these sequences can be used in conjunction with the genes of this 
invention to increase their expression in transgenic plants. For example, various intron 
sequences such as introns of the maize AdhI gene have been shown to enhance 
expression, particularly in monocotyledonous cells. In addition, a number of non-translated 
leader sequences derived from viruses are also known to enhance expression, and these 
are particularly effective in dicotyledonous cells. 

4. Coding Sequence Optimization 

The coding frequence of the selected gene may be genetically engineered by 
altering the coding sequence for optimal expression in the crop species of interest. 
Methods for modifying coding sequences to achieve optimal expression in a particular crop 
species are well known (see, e.g. Perlak ef a/.. Proa Natl. Acad. ScL USA 88: 3324 (1991); 
and Koziel ef a/.. Bio/technoL 11: 194 (1993)). 

5. Targeting of the Gene Product Within the Cell 

Various mechanisms for targeting gene products are known to exist in plants and 
the sequences controlling the functioning of these mechanisms have been characterized in 
some detail. For example, the targeting of gene products to the chloroplast is controlled by 
a signal sequence found at the amino terminal end of various proteins which is cleaved 
during chloroplast import to yield the mature protein (e.g. Comai ef a/. J. Biol. Chem. 263 : 
15104-15109 (1988)). Other gene products are localized to other organelles such as the 
mitochondrion and the peroxisome (e.g. Unger ef a/. Plant Molec. Biol. 13: 411-418 (1989)). 
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The cDNAs encoding these products can also be manipulated to effect the targeting of 
heterologous gene products to these organelles. In addition, sequences have been 
characterized which cause the targeting of gene products to other cell compartments. 
Amino terminal sequences are responsible for targeting to the ER. the apoplast. and 
extracellular secretion from aleurone cells (Koehler & Ho. Plant Cell 2: 769-783 (1990)). 
Additionally, amino terminal sequences in conjunction with carboxy terminal sequences are 
responsible for vacuolar targeting of gene products (Shinshi et al. Plant Molec. Biol. 14: 
357-368 (1990)). By the fusion of the appropriate targeting sequences described above to 
transgene sequences of interest it is possible to direct the transgene product to any 
organelle or cell compartment. 

B. Construction of Plant Transformation Vectors 

Numerous transformation vectors available for plant transformation are known to 
those of ordinary skill in the plant transformation arts, and the genes pertinent to this 
invention can be used in conjunction with any such vectors. The selection of vector will 
depend upon the preferred transformation technique and the target species for 
transformation. For certain target species, different antibiotic or herbicide selection markers 
may be preferred. Selection markers used routinely in transformation include the nptll 
gene, which confers resistance to kanamycin and related antibiotics (Messing & Vierra. 
Gene 19: 259-268 (1982); Bevan et al.. Nature 304:184-187 (1983)). the bar gene, which 
confers resistance to the herbicide phosphinothricin (White et al.. Nucl. Acids Res 18: 1062 
(1990). Spencer et al. Theor. Appl. Genet Zi: 625-631 (1990)). the hph gene, which confers 
resistance to the antibiotic hygromycin (Blochinger & Diggelmann. Mol Cell Biol 4: 2929- 
2931). and the d/ifr gene, which confers resistance to methatrexate (Bourouis et al.. EMBO 
J. m: 1099-1104 (1983)). and the EPSPS gene, which confers resistance to glyphosate 
(U.S. Patent Nos. 4,940.935 and 5.188.642). 

1 Vectors Suitable for Agrobacterium Transformation 

Many vectors are available for transformation using Agrobacterium tumefaciens. 
These typically carry at least one T-DNA border sequence and include vectors such as 
PBIN19 (Bevan. Nucl. Acids Res. (1984)) and pXYZ. Typical vectors suitable for 
Agrobacterium transformation include the binary vectors pCIB200 and pC;32001 , as well as 
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the binary vector pCIBIO and hygromycin selection derivatives thereof. {See, for example. 
U.S. Patent No. 5.639.949). 

2. Vectors Suitable for non-Agrobacterium Transformation 

Transformation without the use of Agrobacterium tumefaciens circumvents the 
requirement for T-DNA sequences in the chosen transformation vector tind consequently 
vectors lacking these sequences can be utilized in addition to vectors such as the ones 
described above which contain T-DNA sequences. Transformation techniques that do not 
rely on Agrobacterium include transformation via particle bombardment, protoplast uptake 
{e.g. PEG and electroporation) and microinjection. The choice of vector depends largely on 
the preferred selection for the species being transformed. Typical vectors suitable for non- 
Agrobacterium transformation include pCIB3064, pS0G19. and pSOG35. (See, for 
example. U.S. Patent No. 5.639.949). 

C. Transformation Techniques 

Once the coding sequence of interest has been cloned into an expression system, it 
is transformed into a plant cell. Methods for transformation and regeneration of plants are 
well known in the art. For example. Ti plasmid vectors have been utilized for the delivery of 
foreign DNA. as well as direct DNA uptake, liposomes, electroporation, micro-injection, and 
microprojectiles. In addition, bacteria from the genus Agrobacterium can be utilized to 
transform plant cells. 

Transformation techniques for dicotyledons are well known in the art and include 
Agrobacterium-based techniques and techniques that do not require Agrobacterium. Non- 
Agrobacterium techniques involve the uptake of exogenous genetic material directly by 
protoplasts or cells. This can be accomplished by PEG or electroporation mediated uptake, 
particle bombardment-mediated delivery, or microinjection. In each case the transformed 
cells are regenerated to whole plants using standard techniques known in the art. 

Transformation of most monocotyledon species has now also become routine. 
Preferred techniques include direct gene transfer into protoplasts using PEG or 
electroporation techniques, particle bombardment into callus tissue, as well as 
Agrobacfer/L/m-mediated transformation. 
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Vlll. Breeding 

The wild-type or altered form of a AIR synthetase gene of the present invention can 
be utilized to confer herbicide tolerance io a wide variety of plant cells, including those of 
gymnosperms. monocots. and dicots. Although the gene can be inserted into any plant cell 
falling within these broad classes, it is particularly useful in crop plant cells, such as rice, 
wheat, barley, rye, corn, potato, carrot, sweet potato, sugar beet. bean. pea. chicory, 
lettuce, cabbage, cauliflower, broccoli, turnip, radish, spinach, asparagus, onion, garlic, 
eggplant, pepper, celery, carrot, squash, pumpkin, zucchini, cucumber, apple, pear, quince, 
melon, plum, cherry, peach, nectarine, apricot, strawberry, grape, raspberry, blackberry, 
pineapple, avocado, papaya, mango, banana, soybean, tobacco, tomato, sorghum and 
sugarcane. 

The high-level expression of a wild-type AIR synthetase gene and/or the expression 
of herbicide-tolerant forms of a AIR synthetase gene conferring herijicide tolerance in 
plants, in combination with other characteristics important for production and quality, can be 
incorporated into plant lines through breeding approaches and techniques known in the art. 

Where a herbicide tolerant AIR synthetase gene allele is obtained by direct selection 
in a crop plant or plant cell culture from which a crop plant can be regenerated, it is moved 
into commercial varieties using traditional breeding techniques to develop a herbicide 
tolerant crop without the need for genetically engineering the allele and transfomiing it into 
the plant. 

The invention will be further described by reference to the following detailed 
examples. These examples are provided for purposes of illustration only, and are not 
intended to be limiting unless othenwise specified. 
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EXAMPLES 

Standard recombinant DNA and molecular cloning techniques used here are well 
known in the art and are described by Sambrook, et aL, Molecular Cloning , eds., Cold 
Spring Harbor Laboratory Press, Cold Spring Harbor, NY (1989) and by TJ. Silhavy, M.L. 
Berman, and L.W. Enquist. Experiments with Gene Fusions . Cold Spring Harbor 
Laboratory. Cold Spring Harbor, NY (1984) and by Ausubel, P.M. ef a/., Current Protocols in 
Molecular Biw.OQV , pub. by Greene Publishing Assoc. and Wiley-lnterscience (1987). 

Example 1: Construction of a Vector Containing a GAL4 Binding Site/Minimal 35S CaMV 
Promoter Fused to Antisense AIR Synthetase 

PAT71 : 

GAL4 binding sites and the minimal 35S promoter (-59 to +1) are excised from 
pGALLuc2 (Goff, ef a/., (1991) Genes & Development 5: 298-309) as an EcoRI-Pstl 
fragment and inserted into the respective sites of pBluescript, yielding pAT52. pAT66 is 
constructed with a three-way ligation between the Hindlll-PstI fragment of pAT52, a Pstl- 
EcoRI fragment of pCIB1716 (contains a 35S untranslated leader, GUS gene, 35S 
terminator) and Hindlll-EcoRI cut pUC18. The 35S leader of pAT66 is excised with Pstl- 
Ncol and replaced with a PCR-generated 35S leader extending from +1 to +48 to yield 
pAT71. 

PJG304 : 

Plasmid pBS SK+ (Stratagene, LaJolla, CA) is linearized with Sac/, treated with 
mung bean nuclease to remove the Sad site, and re-ligated with T4 ligase to make 
pJG201. The 10XGAL4 consensus binding site/CaMV 35S minimal promoter/GUS 
gene/CaMV terminator cassette is removed from pAT71 with Kpnl and cloned into the Kpnl 
site of pJG201 to make pJG304. 

pJG304 is partially digested with restriction endonuclease Asp718 to isolate a full- 
length linear fragment. This fragment is ligated with a molar excess of the 22 base 
oligonucleotide JG-L (5* GTACCTCGAG TCTAGACTCG AG 3\ SEQ ID NO:5). Restriction 
analysis is used to identify a clone with this linker inserted 5' to the GAL4 DNA binding site, 
and this plasmid is designated pJG304DXhol. 
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pDG3: , ^^.^ , 

A tragmsnt of .he 5' phcsphoribosyl-S-aminoimidazole (AlB) synthetase cDNA done 

(SenecoH and Meagher (1993) Ran. Physiotogy, 102: 387-399, is PCR-amplified .,om .he 

..aa/dopsfe ,«a„a cDNA plasmid libraty pFU61 (Mine. e. a,. (1992) Plan. ^-"laU 417- 

422, us ng .he o,igonuc,eo.ides AS-1 (5' GAT CQA OCT CGT TOT CTTCTQ TQT CAT C 

3 SEO to NO:6, and AS-2 (5' GAT CCO ATG GTC OCC AGG TAA AGA OGT C SEQ 

ID NO:7). 

The vector pJG304AXhol is digested with Sad and Nool to excise the GUS gene 
codir^g sequence. The AIR synthetase PGR fragment is digested with Sad and Ncol and 
ligated into pJG304AXhol to make pDG3. 

Example 2: Plant Transformation Vectors for AIR Synthetase Antisense Expression 
from the GAL4 Binding Site/CaMV Minimal 35S Promoter 

pJG261 : 

vector pGPTV (Becker, et a/., (1992) Plant Molecular Biology 20: 1195-1197) .s 
digested with EcoRI and Hindlll to remove the nopaline synthase promoter/GUS cassette, 
concurrently, the superlinker is excised from pSE380 (Invitrogen. San Diego. CA) w.th 
EcoRI and Hindlll and cloned into the EcoRIIHindlll linearized pGPTV. to make pJG261 . 

PDG4 : 

pDG3 is cut with Xhol to excise the cassette containing the GAL4 DNA b.nd.ng 
site/35S minimal promoter/antisense AIR synthetase/CaMV terminator fusion. This cassette 
is ligated into Xhol digested pJG261. such that transcription is divergent from that of the 
BAR selectable marker, producing pDG4. 

Example 3: Production of GAL4 Binding Site/Minimal GaMV 35S Antisense AIR 
Synthetase Transgenic Plants 

pDG4 is electro-transfom^ed (Bio-Rad Laboratories. Hercules. CA) into 
AgroMeiium tuiBefaciens strain C58C1 (pMP90). and Arabidopsis plants (Ecotype 
Columbia) are transformed by infiltration (Bechtold et al. C. R. Acad. Sci. Paris. 316: 1 188- 
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93 (1993).. Seeds from the infiltrated plants are selected on gernnination medium 
(Murashige-Skoog salts at 4.3 g/liter, Mes at 0.5 g/liter, 1% sucrose, thiamine at 10 ug/liter, 
pyridoxine at 5 ug/liter. nicotinic acid at 5 ug/liter, myo-inositol at 1 mg/liter. pH 5.8) 
containing Basta at 15 mg/liter. 

Example 4: Production of GAL4/C1 Transactivator Transgenic Plants 

pSGZLI is constmcted by ligating the GAL4-C1 EcoRI fragment from pGALCI 
(Goff. ef a/.. (1991) Genes & Development, 5: 298-309) into the EcoW site of plC20H. The 
GAL4-C1 fragment of pSGZLI is excised with BamHI-Bglll and inserted into the BamHI site 
of pCIB770 (Rothstein, etaL, (1987) uene 53: 153-161) yielding pAT53. 

Arabidopsis root explants are transformed with pAT53 as described in Valvekens, et 
aL, (1985) PNAS USA 85: 5536-5540. Transgenic plants with single site insertion and 
positive for GAL4/C1 expression are taken to homozygosity. 

Example 5: Antisense Inhibition of AIR Synthetase Using a GAL4/C1 Transactivator 
and a GAL4 Binding Site/Minimal CaMV 35S Promoter 

Fifteen transgenic plants containing the GAL4 binding site/minimal CaMV 35S 
promoter/antisense AIR synthetase construct are transplanted to soil and grown to maturity 
in the greenhouse. Flowers borne on the primary transformants are crossed to pollen from 
the homozygous GAL4/C1 transactivator line pAT53-103. F1 seeds are plated on 
germination medium and gernnination medium containing 15 mg/liter Basta. Seedlings from 
five F1 lines are transplanted to soil and grown to maturity in the greenhouse. Half of the 
seedlings from two F1 lines die while in soil. Half of the seedlings from three F1 lies are 
bleached and severely retarded in growth. These results show that the AIR synthetase gene 
is essential in plants. 

Example 6: Expression of Recombinant Plant AIR Synthetase in E coli 

An Arabidopsis thaliana (Landsberg) cDNA library in the plasmid v?ctor pFL61 
(Minet et al.. Plant J.. 2:417-422 (1992)) is obtained and amplified. PGR primers to amplify 
protein coding sequence of Arabidopsis AIR synthetase are designed from a published 
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I \ot.^i Senecoff and Meagher, Plant Physiol., 102: 
DNA sequence (Genbank accession LI 2457, benecoTT d a . . vi 

DNA sequent, v cvnthetase coding sequence from the plasmid 

387-399 (1993)) and used to amplify the AIR syntneiase CO y h 

L w h P,u DNA po,y.erase ,S,«.a,ene,. Se,u.n*, o. .he PCB p,oa~ s an 
l in he pubilshs. DNA sequence rssu,,i„g in .he lnse.,on o, a cy.os,ne base a. me 
error me p ID NO I . resulting in an incorrect predicted 

p„s.on cor,espon*g .0 pos, on 02 " S O ^ N ^ ^^^^^ ^^^^^^ 

— 

.o ..eepon. .o .he co.. ^^^^^^^^^^^^ 

tt,e coding -egicn 0. .he AIR syn*eUse pre-pro.eln, pnmers s.p242 (5 CGC GGA TCC 
CTA CTG ATA GCT TAG GOO TTC ACC 3'. SEQ .D Nd:8) and slp244 (5' TTQ AAG CCA 
Z AAG CTC GGA TTT TG SEO iD N0:9, are used, and ,or ,he ccns.™c. inciudin, 

Td The c««ng regions o, .he pre-pro.ein and 0. .he pu.a.ive .a.u,e pro.e,n are 
rbcloned in.o .he expression veCor pET32a (Novagen, and bo,h are .ransformed ,mo 
0E3 pLvsS (Novagen, by eieCroporaUon using .he Biorad Gene Puiser and .he 
manufacturer's conditions. 

Example 7: Growth and Extraction of FGAM Synthetase 

ECO// strain TX635/pJS113 (Schendel et al. (1989) Biochemistry 28. 2459-2471) is 
grown in Luria broth (LB) containing 50 pg/mL carbenici.iin at 30"C in an incubator/shaker^ 

I .h «n oDtical density of approximately 1 OD at 600 nm, an equal volume 

When the cells reach an optical oensiiy u\ j 

o, LB carben«n a. SB'C is added ,o hea.-shocK .he ceiis. Subse,uen.iy he =e„ a^ 
aced in an incuta.or.shaKer and grown a. 4rC. The ceiis are ha.es.ed a. . e end ^ o 
Phase using .ow speed cen.n,uga«on. The cemri-uge bo..ie is inverted and .he med.a . 
a led 0 rain. The ce. pe«e. is resuspended - a snrai, pain.brush in bu«er A 50 .M 
E 7s pH 7.S. ImM EDTA. 2 mM DTT, 150 .M KC, ,0% giyceroi, and .hen dis.p.ed ,n 
,rench ressure ceii a. ,8,000 PS,. Foiiowing a high speed cen,ri,uga.ion ,o ren,ove ce 
d Is he enzyme is precipi.a.ed wi.h ammonium su,.a.e (40-60%, and .he pe„e.s s.ored 
T2'0 e en Je is resuspended in a sma. voiume o, Bu„er A and appiied .0 a 
Sephadex 0-25 coiumn .or desa,.ing in.o Bu„er A. The ac.ivi.y is assayed as descnbed 
below. 
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Example 8: Growth and Extraction of AIR Synthetase 

E.coli strain pJS24/Tx393 (Schrimsher et al. (1986) Biochemistry 25, 4366-4371) 
containing multiple gene copies of the native AIR synthetase is grown in LB broth containing 
50 pg/mL of carbenicillin at 37*'C in an incubator-shaker. The cells are harvested at the end 
of the log phase of growth and pelleted in a centrifuge at low speed, the growth media is 
discarded and the centrifuge bottle is inverted and allowed to drain. The cells are 
resuspended in buffer A with a small paintbrush and disrupted in a French Pressure Cell at 
approximately 18.000 PSI. Following a high speed centrifugation to pellet cell debris, the 
supernatant is precipitated with ammonium sulfate and stored at -80**C. 

Example 9: AIR Synthetase Activity Assay 

The AIR synthetase activity assay is essentially derived from Schrimsher et al. 
(1986) Biochemistry 25, 4356-4365, The reaction volumes are preferably the ones 
described below, but can be varied depending on the experimental requirements. 0.2-1.0 x 
10"* unit of an enzyme having AIR synthetase activity (one unit of activity is defined as the 
amount of enzyme required to produce 1 mmol/min of product) and 0.1 mM 5'- 
phosphoribosyl-N-formylglycinamidine (FGAM) are mixed in a final volume of 96 ml 50 mM 
HEPES (pH 7.4-8.1. but preferably 7.7). 20 mM MgClz. 150 mM KCI and O.Oi-IO mM, but 
preferably 2.0 mM ATP. The production of AIR is determined preferably according to 
Bratton and Marshall (J. Biol, Chem. (1939) 128, 537-550) by adding 32 ml of 1.33 M 
potassium phosphate in 20% (w/v) trichloroacetic acid (pH 1,4). The mixture is centrifuged 
to remove precipitated protein and 32 ml of 0.1% (w/v) sodium nitrite is added. After 3 min., 
32 ml of 0.5% (w/v) ammonium sulfamate is added and. after an additional minute. 8 ml of 
25% N-(1-naphthyl)ethylenediamine dihydrochloride is added. The absorbance is measured 
at 530 nm after 10 min. 

Alternatively. ADP formation is quantitated by a coupled reaction procedure. In this 
case, 3.5 units of pyruvate kinase, 4,7 units of lactate dehydrogenase, 1.0 mM 
phosphenolpyruvate and 0.2 mM NADH are added and absorbance is measured at 340 nm. 
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Example 1 0: Coupled FG AM Synthetase and AIR Synthetase Enzyme Assays 
A. FGAM synthetase assay 

The conversion of FGAR to FGAM is followed by detecting the concommittant 
formation of ADP. The ADP formation is followed utilizing the enzymes pyruvate kinase, 
and lactate dehydrogenase (reagent enzymes) and detecting the conversion of NADH to 
NAD* in the presence of phosphoenolpyruvate (PEP). This is monitored at 340 nm. 
pyruvate kinase and PEP facilitate the reger. . ration of ATP from ADP. ATP is a required 
substrate for both FGAM synthetase and AIR synthetase. The assay buffer is buffer A with 
the addition of 20 mM MgCl2. 

B. AIR synthase assay 

To assay AIR synthase it is necessary to provide the substrate FGAM. The FGAM is 
provided by the conversion of FGAR to FGAM in the same reaction mixture. If NADH is 
added the conversion can be followed utilizing the FGAM synthetase assay. When the 
FGAR-FGAM conversion proceeds sufficiently (approximately 50pM) then AIR synthetase is 
added. Adding the AIR synthetase after the production of FGAM insures that the initial 
concentration of FGAM is constant in all reaction wells. The AIR synthetase is assayed by 
the method of Bratton and Marshall (J. Biol. Chem. (1939) 128. 537-550). After a sufficient 
time for AIR production (typically 15 minutes) the enzyme reaction is stopped with TCA. 
The AIR is derivatized with sodium nitrite and the nitrite is subsequently neutralized with 
ammonium sulfamate. The color is developed with the addition of N-(1-napthyl)ethylene- 
diamine dihydrochloride (NEDD). After 10 minutes the color is monitored at 530 nm. 

C. Assay Protocols 

The assays are carried out in the same way independent of the original source of 
the enzymes. The assays are performed in 300 96 well microtiter plates. The total assay 
reaction volume is 200 pL. Substrates (except FGAR) are mixed in a ratio such that the 
final concentrations (in the microtiterplate) are as follows: L-glutamine (600 mM), ATP (600 
MM). PEP (1 mM). and NADH (200 pM). A mixture of substrates at 10X concentration can 
be pipetted at 20 pUwell. The reagent enymes and FGAM synthetase can also be mixed to 
be added simultaneously. The suggested amounts of the ADP detecting/regeneration mix 
is 0.7 units pyruvate kinase and 0.97 units lactate dehydrogenase per reaction. This should 
be used as a guideline and the amounts of enzyme adjusted empirically. The FGAR (200 
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\iM) should be added after a two minute incubation period. After the FGAM synthase 
reaction precedes to completion at a rate of approximately 10 pM/minute (this is within 10- 
15 minutes), the AIR synthetase is added. After an interval (determined by the activity of 
the AIR synthetase) the reaction is stopped with 66 pL of 20% TCA in 1M K3PO4 The plate 
in spun in a centrifuge to pellet the precipitated protein, then the supernatant is transferred 
to a separate microtiterplate for color development and reading. 1.2 pk.of 10% sodium 
nitrite is added and after 3 minutes 1.2 pL of 50% ammonium sulfamate is added 
(neutralizes excess nitrite). One minute later, 8.3 pL of 1% NEDD are added and after 5 
minutes, the plate is read at 530 nm using a microtiter plate reading UVA/iS 
spectrophotometer. AlCAR is used as a standard since AIR is not available for that 
purpose. Based on AlCAR a reasonable detection limit (3-fold OD over background) of 10 
pM is easily attainable. 

L-Glutamine. ATP. sodium nitrite, ammonium sulfamate. and NEDD, are available 
from Sigma Chemicals. FGAR is synthesized by the methods of Chen and Henderson 
(Can. J. Chemistry (1970) 48: 2306-2309) or Carrington et al. (J.Chem. Soc. (1968) 6864). 

Example 11: In vitro Recombination of AIR Synthetase Genes by DNA Shuffling 

The A. thaliana AIR synthetase gene encoding the pre-protein is amplified by PGR 
as described in example 6. The resulting DNA fragment is digested by DNas-^l treatment 
essentially as described (Stemmer et al. (1994) PNAS 91: 10747-10751) and the PGR 
primers are removed from the reaction mixture. A PGR reaction is carried out without 
primers and is followed by a PGR reaction with the primers, both as described (Stemmer et 
al. (1994) PNAS 91: 10747-10751). The resulting DNA fragments are cloned into pTRC99a 
(Pharmacia. Cat no: 27-5007-01) and transformed into E.co// strain S066O9/1KU (Schnorr et 
al. (1994) Plant Journal 6: 113-121) by electroporation using the Biorad Gene Pulser and 
the manufacturer's conditions. The transformed bacteria are grown on medium that contains 
inhibitory concentrations of the inhibitor and those colonies that grow in the presence of the 
inhibitor are selected. Colonies that grow in the presence of normally inhibitory 
concentrations of inhibitor are picked and purified by repeated restreaking. Their plasmids 
are purified and the DNA sequences of cDNA inserts from plasmids that pass this test are 
then determined. 

In a similar reaction, PCR-amplified DNA fragments comprising the A, thaliana AIR 
synthetase gene encoding the pre-protein and PCR-amplified DNA fragments comprising 
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the £co// purM gene are recombined in vitro and resulting variants with improved tolerance 
to the inhibitor are recovered as described above. 

Example 12: In vitro Recombination of AIR Synthetase Genes by Staggered Extension 
Process 

The A. thaliana AIR synthetase gene encoding the mature protein and the E.coli 
purM gene are each cloned into the polyiinker of a pBluescript vector. A PGR reaction is 
carried out essentially as described (Zhao et al. (1998) Nature Biotechnology 16: 258-261) 
using the "reverse primer" and the "Ml 3 20 primer" (Stratagene Catalog). Amplified PGR 
fragments are digested with appropriate restriction enzymes and cloned into pTRC99a and 
mutated AIR synthetase genes are screened as described in example 1 1 . 

The above disclosed embodiments are illustrative. This disclosure of the invention 
will place one skilled in the art in possession of many variations of the invention. All such 
obvious and foreseeable variations are intended to be encompassed by the appended 
claims. 
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What Is Claimed Is: 

1. An isolated enzyme comprising an amino acid sequence that is identical or 
substantially similar to SEQ ID N0:2 or SEQ ID N0:4, wherein said enzyme has 5'- 
phosphoribosyl-5-aminoimidazole (AIR) synthetase activity, 

2. An isolated enzyme according to claim 1, wherein said amino acid sequence is 
derived from a plant. 

3. An isolated enzyme according to claim 1, wherein said amino acid sequence is 
SEQ ID NO:2. 

4. An isolated enzyme according to claim 1, wherein said amino acid sequence is 
SEQ ID NO:4. 

5. An isolated nucleic acid molecule comprising a nucleotide sequence that 
encodes the amino acid sequence set forth in SEQ ID N0:2 or SEQ ID NO:4. 

6. An isolated nucleic acid molecule according to claim 5. wherein said nucleotide 
sequence is SEQ ID NO:1 or SEQ ID N0;3. 

7. An isolated nucleic acid molecule according to claim 5, wherein said nucleotide 
sequence is deposited in E co// strain DHSapASM designated as NRRL B-21976. 

8. A chimeric gene comprising a heterologous promoter sequence operatively 
linked to the nucleic acid molecule of claim 5. 

9. A recombinant vector comprising the chimeric gene of claim 8. 

10. A host cell comprising the chimeric gene of claim 8. 

11 . A host cell according to claim 10, which is a bacterial cell. 
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12. A host cell according to claim 10, which is a yeast cell. 

13. A host cell according to claim 10, which is a plant cell. 

14. A plant comprising the plant cell of claim 13. 



15. Seed from the plant of claim 14. 

16. A method of identifying chemicals having the ability to inhibit plant growth or 
viability, comprising: 

(a) combining an enzyme having AIR synthetase activity in a first reaction mixture 
with a substrate of AIR synthetase under conditions in which the enzyme is 
capable of catalyzing the synthesis of AIR; 

(b) combining a chemical to be tested and the enzyme in a second reaction mixture 
with a substrate of AIR synthetase under the same conditions and for the same 
period of time as in the first reaction mixture; and 

(c) determining and comparing the activity of the enzyme in the first and second 
reaction mixtures; 

wherein less enzyme activity in the second reaction mixture than in the first reaction mixture 
indicates that the chemical of (b) has the ability to inhibit plant growth or viability. 

17. A method of identifying chemicals having the ability to inhibit plant growth or 
viability, comprising: 

(a) combining an enzyme having 5'-phosphoribosyl-N-formylglycinamidine (FGAM) 
synthetase activity and an enzyme having AIR synthetase activity in a first 
reaction mixture with a substrate of FGAlVl synthetase under conditions in which 
the enzymes are capable of catalyzing the coupled synthesis of AIR; 

(b) combining a chemical to be tested and the enzymes in a second reaction 
mixture with a substrate of FGAt^ synthetase under the same conditions and the 
same period of time as in the first reaction mixture; and 

(c) determining and comparing the activity of the enzyme having AIR synthetase 
activity in Lhe first and second reaction mixtures; 

wherein less enzyme activity in the second reaction mixture than in the first reaction mixture 
indicates that the chemical of (b) has the ability to inhibit plant growth or viability. 
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18. A method for identifying chennicals having herbicidal activity that inhibit AIR 
synthetase activity in plants, comprising: 

(a) obtaining transgenic plants, plant tissue, plant seeds or plant cells comprising an 
isolated nucleotide sequence encoding an enzyme having AIR synthetase 
activity and capable of overexpressing an enzymatically active-^IR synthetase; 

(b) applying a chemical to be tested to the transgenic plants, plant cells, tissues or 
parts and to the isogenic non-trans' med plants, plant cells, tissues or parts; 

(c) determining the growth or viability of the transgenic and non-transformed plants, 
plant cells, tissues after application of the chemical; and 

(d) comparing the growth or viability of the transgenic and non-transformed plants, 
plant cells, tissues after application of the chemical; 

wherein suppression of the growth or viability of the non-transgenic plants, plant cells, 
tissues or parts, without significantly suppressing the growth or viability of the isogenic 
transgenic plants, plant cells, tissues or parts indicates that the chemical of (b) has 
herbicidal activity that inhibits AIR synthetase activity in plants. 

19. A method according to claim 16. wherein the substrate is 5'-phosphoribosyl-N- 
formylglycinamidine (FGAM). 

20. A method according to claim 16, wherein the substrate is b-FGAM. 

21. A method according to claim 17. wherein the substrate is S'-phosphoribosyl-N- 
formylglycinamide (FGAR). 

22. A method according to claim 17. wherein the substrate is b-FGAR. 

23. A method according to any of claims 16-18, wherein the enzyme having AIR 
synthetase activity is derived from a plant. 

24. A method according to any of claims 16-18, wherein the enzyme having AIR 
synthetase activity comprises an amino acid sequence identical or substantially similar to 
the amino acid sequence set forth in SEQ ID NO:2 or SEQ ID N0:4. 
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25. A method according to any of claims 16-18. wherein the enzyme having AIR 
synthetase activity is derived from E coli. 

26. A method according to any of claims 16-18. wherein the activity of the enzyme is 
determined by measuring the AIR produced in the reaction mixture. ■ - 

27. " method for suppressing the growth of undesired vegetation, comprising the 
step of applying to the undesired vegetation a chemical identified by the method of any of 
claims 16-18. 

28. A transgenic plant, plant cell, plant seed, or plant tissue comprising a nucleotide 
sequence encoding an enzyme having AIR synthetase activity, wherein the nucleotide 
sequence confers upon said transgenic plant, plant cell, plant seed, or plant tissue 
tolerance to a chemical identified by the method of any of claims 16-18 in amounts that 
normally inhibits AIR synthetase activity in a wild-type plant. 

29. A plant made by a process comprising transfomiing the plant or a parent of the 
plant with an isolated DNA molecule comprising a nucleotide sequence encoding an 
enzyme having AIR synthetase activity and capable of expressing the nucleotide sequence 
in the plant so as to render the plant tolerant to a chemical identified by the method of any 
of claims 16-18. 

30. A method for selectively suppressing the growth of weeds in a field containing a 
crop of planted crop se»- Js or plants, comprising: 

(a) planting herbicide tolerant crops or crop seeds, which are plants or plant seeds 
transformed with an isolated DNA molecule comprising a nucleotide sequence having AIR 
synthetase activity, wherein said nucleotide sequence is expressible in said plant or plant 
seed; and 

(b) applying to the crops or crop seeds and the weeds in the field a herbicide in 
amounts that inhibit naturally occurring AIR synthetase activity, wherein the herbicide 
suppresses the growth of the weeds without significantly suppressing the growth of the 
crops. 
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31 . A method for forming a mutagenized DNA molecule encoding an enzyme having 
AIR synthetase activity from a template DNA molecule encoding an enzyme having AIR 
synthetase activity, wherein said template DNA molecule has been cleaved into double- 
stranded-random fragments, comprising the steps of: 

(a) adding to the resultant population of double-stranded-random fragments at least 
one single-stranded or double-stranded oligonucleotide, wherein said 
oligonucleotide comprises an area of identity and an area of heterology to the 
template DNA molecule; 

(b) denaturing the resultant mixture of double-stranded-random fragments and 
oligonucleotides into single-stranded molecules; 

(c) incubating the resultant population of single-stranded molecules with a 
polymerase under conditions which result in the annealing of said single- 
stranded molecules at said areas of identity to form pairs of annealed 
fragments, said areas of identity being sufficient for one member of a pair to 
prime replication of the other, thereby forming a mutagenized double-stranded 
polynucleotide; 

(d) repeating the second and third steps for at least two further cycles, wherein the 
resultant mixture in the second step of a further cycle includes the mutagenized 
double-stMnded polynucleotide from the third step of the previous cycle, and the 
further cycle forms a further mutagenized double-stranded polynucleotide; 

wherein the mutagenized double-stranded polynucleotide encodes an AIR synthetase 
enzyme having enhanced tolerance to a herbicide which inhibits the AIR synthetase activity 
encoded by the template DNA molecule. 

32. A method for forming a mutagenized DNA molecule encoding an enzyme having 
AIR synthetase activity from at least two non-identical template DNA molecules encoding 
enzymes having AIR synthetase activity, comprising the steps of: 

(a) adding to the template DNA molecules at least one oligonucleotide comprising 
an area of identity to each of the template DNA molecule; 

(b) denaturing the resultant mixture into single-stranded molecules; 

(c) incubating the resultant population of single-stranded molecules with a 
polymerase under conditions which result in the annealing of the 
oligonucleotides to the template DNA molecules, wherein the conditions for 
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polymerization by the polymerase are such that polymerization products 
corresponding to a portion of the template DNA molecules are obtained; 
(d) repeating the second and third steps for at least two further cycles, wherein the 
extension products obtained in the third step are able to switch template DNA 
molecule for polymerization in the next cycle, thereby forming a mutagenized 
double-stranded polynucleotide comprising sequences derived from different 
template DNA molecules; 
wherein the mutagenized double-stranded polynucleotide encodes an AIR synthetase 
enzyme having enhanced tolerance to a herbicide which inhibits the AIR synthetase act,v,ty 
encoded by the template DNA molecules. 

33 A mutagenized DNA molecule encoding an enzyme having AIR synthetase 
activity obtained by the method of claim 31. wherein said mutagenized DNA molecule 
encodes an AIR synthetase enzyme having enhanced tolerance to a herbicide which 
inhibits the AIR synthetase activity encoded by said template DNA molecule. 

34 A mutagenized DNA molecule encoding an enzyme having AIR synthetase 
activity obtained by the method of claim 32. wherein said mutagenized DNA molecule 
encodes an AIR synthetase enzyme having enhanced tolerance to a herbicide wh.ch 
inhibits the AIR syntl.3tase activity encoded by said template DNA molecule. 

35. The method of claim 31 or claim 32. wherein at least one template DNA 
molecule is derived from a eukaryote. 

36. The method of claim 35, wherein said eukaryote is a plant. 

37. The method of claim 36. wherein said plant is Arabidopsis thaliana. 

38. The method of claim 37, wherein said species of template DNA molecule is 
identical or substantially similar to the SEQ ID N0:1 or SEQ ID N0:3. 

39. The method of claim 31 or claim 32, wherein one template DNA molecule is 
derived from a prokaryote. 
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SEQUENCE LISTING 

<110> Novartis AG 

<120> METHODS TO SCREEN HERBICIDAL COMPOUNDS AND USES THEREOF 

<130> PH/5-30552/A/CGC1999 

<140> 
<141> 

<150> US 09/103,895 
<151> 1998-06-24 

<160> 10 

<170> Patentin Ver. 2.0 

<210> 1 
<211> 1172 
<212> DNA 

<213> Arabidopsis thaliana 

<220> 

<221> CDS 

<222> (3) . . (1160) 

<223> AIR synthetase cDNA 

<400> 1 

cc atg gaa get egg att ttg cag tct tct tct tec tgt tat teg tct 47 

Met Glu Ala Arg He Leu Gin Ser Ser Ser Ser Cys Tyr Ser Ser 
15 10 15 

ett tae act gte aat cga tee egg ttc tct tet ccg aaa cct ttc tec 95 
Leu Tyr Thr Val Asn Arg Ser Arg Phe Ser Ser Pro Lys Pro Phe Ser 
20 25 30 

gte age ttt get eag acg aeg aga aca agg act cgt gta tta tec atg 143 
Val Ser Phe Ala Gin Thr Thr Arg Thr Arg Thr Arg Val Leu Ser Met 
35 40 45 

teg aag aaa gat ggt cgc act gat aaa gat gat gae act gat agt etc 191 
Ser Lys Lys Asp Gly Arg Thr Asp Lys Asp Asp Asp Thr Asp Ser Leu 
50 55 60 

aat tae aaa gat tct ggt gtt gat ate gat get ggt get gag ett gtt 239 
Asn Tyr Lys Asp Ser Gly Val Asp He Asp Ala Gly Ala Glu Leu Val 
65 70 75 

aaa cga ate gea aag atg get cct gga att ggt gga ttt ggt ggt etc 287 
Lys Arg He Ala Lys Met Ala Pro Gly He Gly Gly Phe Gly Gly Leu 
80 85 90 95 

ttt eca tta ggt gat agt tat ett gta get ggt acg gat ggt gta ggg 335 
Phe Pro Leu Gly Asp Ser Tyr Leu Val Ala Gly Thr Asp Gly Val Gly 
100 105 110 

act aaa ttg aaa ttg gea ttt gaa act gga att cat gae ace att gga 383 
Thr Lys Leu Lys Leu Ala Phe Glu Thr Gly He His Asp Thr He Gly 
115 120 125 

ate gae ttg gtt get atg agt gtg aat gat att att act tct ggt gea 431 
He Asp Leu Val Ala Met Ser Val Asn Asp He He Thr Ser Gly Ala 
130 135 140 
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gta aga agg gta ttg get cga age aat ctt teg ctg aat gat gcg ctt 
Val Arg Arg Val Leu Ala Arg Ser Asn Leu Ser Leu Asn Asp Ala Leu 
240 245 250 255 

cca ggt gga tea agt ace ctt ggt gat get eta atg gca ccc act gtc 
Pro Gly Gly Ser Ser Thr Leu Gly Asp Ala Leu Met Ala Pro Thr Val 
260 265 270 

att tac gtg aaa cag gta ctt gat atg ata gaa aaa gga gga gtg aaa 
lie Tyr Val Lys Gin Val Leu Asp Met He Glu Lys Gly Gly Val Lys 
275 280 285 

ggt tta get eat ate aca ggc gga ggt ttc aca gac aac att ccc cga 
Gly Leu Ala His He Thr Gly Gly Gly Phe Thr Asp Asn He Pro Arg 
290 295 300 

gtc ttc ccg gac ggt ttg ggt get gtt att cac ace gat act tgg gaa 
Val Phe Pro Asp Gly Leu Gly Ala Val He His Thr Asp Thr Trp Glu 
305 310 315 



320 



330 335 



479 



527 



aag cct ctg ttt ttc ctt gat tac ttt get act agt cgt ctt gat gta 

Lys Pro Leu' Phe Phe Leu Asp Tyr Phe Ala Thr Ser Arg Leu Asp Val 

145 150 155 

gac ctt get gaa aag gtc att aaa ggg att gtt gaa ggt tgt egg caa 

ASP Leu Ala Glu Lys Val He Lys Gly He Val Glu Gly Cys Arg Gin 
160 165 170 175 

teg gaa tgt get etc tta ggg gga gag act gca gag atg cct gac ttt 575 

Ser Glu Cys Ala Leu Leu Gly Gly Glu Thr Ala Glu Met Pro Asp Phe 
180 185 190 ^■ 

tat gca gag ggc gag tac gat eta agt ggg ttt gca gta ggc ata gta 

Tyr Ala Glu Gly Glu Tyr Asp Leu Ser Gly Phe Ala Val Gly He Val 
195 200 205 



623 



aag aaa act tea gtt ate aac gga aaa aac att gtg gee ggt gat gtt 671 

Lys Lys Thr Ser Val He Asn Gly Lys Asn He Val- Ala Gly Asp Val 
210 215 220 

ctt att ggc etc ccg tet agt ggt gtt cat tec aat ggt ttt tet eta 719 

Leu He Gly Leu Pro Ser Ser Gly Val His Ser Asn Gly Phe Ser Leu 

225 230 235 



767 



815 



863 



911 



959 



ctt cca ccg ttg ttc aag tgg att caa cag act ggg aga ata gaa gac 1007 
Leu Pro Pro Leu Phe Lys Trp He Gin Gin Thr Gly Arg He Glu Asp 



1055 



1103 



agt gag atg aga agg acg ttt aac ctg ggg ata ggg atg gtt atg gtg 
Ser Glu Met Arg Arg Thr Phe Asn Leu Gly He Gly Met Val Met Val 
340 345 350 

gtt agt cca gag gca get tea cga ata eta gaa gaa gtc aag aat gga 
Val Ser Pro Glu Ala Ala Ser Arg He Leu Glu Glu Val Lys Asn Gly 
355 360 365 

gac tat gtt gcg tat cgc gta gga gag gtt gtc aac ggt gaa ggc gta 1151 
Asp Tyr Val Ala Tyr Arg Val Gly Glu Val Val Asn Gly Glu Gly Val 
370 375 380 

age tat cag tagtgaggat cc 1172 
Ser Tyr Gin 
385 
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<210> 2 
<211> 386 
<212> PRT 

<213> Arabidopsis thaliana 

Me?°Glu Ala Arg He Leu Gin Ser Ser Ser Ser Cys Tyr Ser Ser Leu 
1 5 10 

Tyr Thr Val Asn Arg Ser Arg Phe Ser Ser Pro Lys Pro Phe Ser Val 
20 25 J" 

Ser Phe Ala Gin Thr Thr Arg Thr Arg Thr Arg Val Leu Ser Met Ser 
35 40 ^3 

Lys Lys ASP Gly Arg Thr Asp Lys Asp Asp Asp Thr Asp Ser Leu Asn 
50 55 60 

Tyr Lys Asp Ser Gly Val Asp He Asp Ala Gly Ala Glu Leu Val Lys 
65 70 75 80 

Arg He Ala Lys Met Ala Pro Gly He Gly Gly Phe Gly Gly Leu Phe 
85 9° ^ 

Pro Leu Gly Asp Ser Tyr Leu Val Ala Gly Thr Asp Gly Val Gly Thr 
100 105 li" 

Lys Leu Lys Leu Ala Phe Glu Thr Gly He His Asp Thr lie Gly He 
115 120 125 

ASD Leu Val Ala Met Ser Val Asn Asp He He Thr Ser Gly Ala Lys 
130 135 140 

Pro Leu Phe Phe Leu Asp Tyr Phe Ala Thr Ser Arg Leu Asp Val Asp 
145 150 155 IbO 

Leu Ala Glu Lys Val He Lys Gly He Val Glu Gly Cys Arg Gin Ser 
165 I'^O 1'^ 

Glu cys Ala Leu Leu Gly Gly Glu Thr Ala Glu Met Pro Asp Phe Tyr 
180 185 li*" 

Ala Glu Gly Glu Tyr Asp Leu Ser Gly Phe Ala Val Gly He Val Lys 
195 200 205 

Lys Thr Ser Val He Asn Gly Lys Asn He Val Ala Gly Asp Val Leu 
210 215 220 

He Gly Leu Pro Ser Ser Gly Val His Ser Asn Gly Phe Ser Leu Val 
225 230 235 240 

Arg Arg Val Leu Ala Arg Ser Asn Leu Ser Leu Asn Asp Ala Leu Pro 
245 250 •^33 

Gly Gly Ser Ser Thr Leu Gly Asp Ala Leu Met Ala Pro Thr Val He 

icn 565 270 



260 



Tvr Val Lys Gin Val Leu Asp Met He Glu Lys Gly Gly Val Lys Gly 
275 280 285 

Leu Ala His He Thr Gly Gly Gly Phe Thr Asp Asn He Pro Arg Val 
290 295 300 

Phe Pro ASP Gly Leu Gly Ala Val He His Thr Asp Thr Trp Glu Leu 
305 310 315 320 
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Pro Pro Leu Phe Lys Trp He Gin Gin Thr Gly Arg He Glu Asp Ser 
325 

Glu Met Arg Arg Thr Phe Asn Leu Gly He Gly Met Val Met Val Val 
340 -3=>u 

ser Pro Glu Ala Ala Ser Arg lie Leu Glu Glu Val Lys Asn Gly Asp 
355 360 36b 



Tyr 



val Ala Tyr Arg Val Gly Glu Val Val Asn Gly Glu Gly Val" :Ser 



370 



375 380 



Tyr Gin 
385 



<210> 3 
<211> 1013 
<212> DNA 

<213> Arabidopsis thaliana 
<220> 

<221> mat_peptide 
<222> (3 ) . . (1001) 

<223> coding sequence of AIR synthetase putative mature 
peptide 

<220> 

<221> CDS 

<222> (3) . . (1001) 



cc°atg\at aaa gat gat gac act gat agt etc aat tac aaa gat tct 47 
net ASP Lys ASP ASP Asp Thr Asp Ser Leu Asn Tyr Lys Asp Ser 

rTrr^ ott aat ate gat get ggt get gag ctt gtt aaa cga ate gea aag 
Ily fal ASP ne Asp lla Gly Ala Glu Leu Val Lys Arg lie Ala Lys 



20 



ata act ect gga att ggt gga ttt ggt ggt etc ttt cca tta ggt gat 
Met Ala Pro Gly He Gly Gly Phe Gly Gly Leu Phe Pro Leu Gly Asp 
35 40 45 

aat tat ctt gta get ggt acg gat ggt gta ggg act aaa ttg aaa ttg 
ser 5^r Leu Val Ala Ily Thr Asp Gly Val Gly Thr Lys Leu Lys Leu 

55 °0 



Qca ttt gaa act gga att cat gac acc att gga ate gac ttg gtt get 
Ala pSe Gin Thr Gly He His Asp Thr He Gly He Asp Leu Val Ala 
65 70 75 

ata agt gtg aat gat att att act tct ggt gea aag cct ctg ttt ttc 
till sir ?al Asn Asp He He Thr Ser Gly Ala Lys Pro Leu Phe Phe 
80 85 50 95 

ctt gat tac ttt get act agt cgt ctt gat gta gac ctt get gaa aag 
Leu ASP Tyr Phe Ala Thr Ser Arg Leu Asp Val Asp Leu Ala Glu Lys 
100 105 HO 

ate att aaa ggg att gtt gaa ggt tgt egg caa teg gaa tgt get etc 
val He lys Gly He Val Glu Gly Cys Arg Gin Ser Glu Cys Ala Leu 
115 120 125 



95 



143 



191 



239 



287 



335 



383 
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tta ggg gga gag act gca gag atg cct gac ttt tat gca gag ggc gag 431 
Leu Gly Gly Glu Thr Ala Glu Met Pro Asp Phe Tyr Ala Glu Gly Glu 
130 135 140 

tac gat eta agt ggg ttt gca gta ggc ata gta aag aaa act tea gtt 479 
Tyr Asp Leu Ser Gly Phe Ala Val Gly lie Val Lys Lys Thr Ser Val 
145 150 155 

ate aac gga aaa aac att gtg gee ggt gat gtt ett att ggc etc ceg 527 
lie Asn Gly Lys Asn He Val Ala Gly Asp Val Leu He Gly Leu Pro 
160 165 170 ■ 175 

tct agt ggt gtt cat tec aat ggt ttt tet eta gta aga agg gta ttg 575 
Ser Ser Gly Val His Ser Asn Gly Phe Ser Leu Val Arg Arg Val Leu 
180 185 190 

get cga age aat ett teg ctg aat gat geg ctt cea ggt gga tea agt 623 
Ala Arg Ser Asn Leu Ser Leu Asn Asp Ala Leu Pro Gly Gly Ser Ser 
195 200 205 

acc ett ggt gat get eta atg gca cec act gte att tac gtg aaa eag 671 
Thr Leu Gly Asp Ala Leu Met Ala Pro Thr Val He Tyr Val Lys Gin 
210 215 220 

gta ctt gat atg ata gaa aaa gga gga gtg aaa ggt tta get cat ate 719 
Val Leu Asp Met He Glu Lys Gly Gly Val Lys Gly Leu Ala His He 
225 230 235 

aca ggc gga ggt ttc aca gac aac att cec cga gte tte ccg gac ggt 767 
Thr Gly Gly Gly Phe Thr Asp Asn lie Pro Arg Val Phe Pro Asp Gly 
240 245 250 255 

ttg ggt get gtt att cac acc gat act tgg gaa ctt cea ccg ttg ttc 815 
Leu Gly Ala Val He His Thr Asp Thr Trp Glu Leu Pro Pro Leu Phe 
260 265 270 

aag tgg att caa cag act ggg aga ata gaa gac agt gag atg aga agg 863 
Lys Trp He Gin Gin Thr Gly Arg He Glu Asp Ser Glu Met Arg Arg 
275 280 285 

acg ttt aac ctg ggg ata ggg atg gtt atg gtg gtt agt cea gag gca 911 
Thr Phe Asn Leu Gly He Gly Met Val Met Val Val Ser Pro Glu Ala 
290 295 300 

get tea cga ata eta gaa gaa gte aag aat gga gac tat gtt gcg tat 959 
Ala Ser Arg He Leu Glu Glu Val Lys Asn Gly Asp Tyr Val Ala Tyr 
305 310 315 

egc gta gga gag gtt gte aac ggt gaa ggc gta age tat cag 1001 

Arg Val Gly Glu Val Val Asn Gly Glu Gly Val Ser Tyr Gin 

320 325 330 

tagtgaggat ce 1013 

<210> 4 
<211> 333 
<212> PRT 

<213> Arabidopsis thaliana 
<400> 4 

Met Asp Lys Asp Asp Asp Thr Asp Ser Leu Asn Tyr Lys Asp Ser Gly 
15 10 15 

Val Asp He Asp Ala Gly Ala Glu Leu Val Lys Arg He Ala Lys Met 
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20 



25 



30 



Ala Pro Gly lie Gly Gly Phe Gly Gly Leu Phe Pro Leu Gly Asp Ser 
35 40 45 

Tyr Leu Val Ala Gly Thr Asp Gly Val Gly Thr Lys Leu Lys Leu Ala 
50 55 60 

Phe Glu Thr Gly lie His Asp Thr lie Gly lie Asp Leu Val Ala Met 
65 70 75 80 

Ser Val Asn Asp lie lie Thr Ser Gly Ala Lys Pro Leu Phe Phe Leu 
85 90 95 

Asp Tyr Phe Ala Thr Ser Arg Leu Asp Val Asp Leu Ala Glu Lys Val 
100 105 110 

lie Lys Gly lie Val Glu Gly Cys Arg Gin Ser Glu Cys Ala Leu Leu 
115 120 125 

Gly Gly Glu Thr Ala Glu Met Pro Asp Phe Tyr Ala Glu Gly Glu Tyr 
130 135 140 

Asp Leu Ser Gly Phe Ala Val Gly lie Val Lys Lys Thr Ser Val lie 
145 150 155 160 

Asn Gly Lys Asn lie Val Ala Gly Asp Val Leu lie Gly Leu Pro Ser 
165 170 175 

Ser Gly Val His Ser Asn Gly Phe Ser Leu Val Arg Arg Val Leu Ala 
180 185 190 

Arg Ser Asn Leu Ser Leu Asn Asp Ala Leu Pro Gly Gly Ser Ser Thr 
195 200 205 

Leu Gly Asp Ala Leu Met Ala Pro Thr Val He Tyr Val Lys Gin Val 
210 215 220 

Leu Asp Met He Glu Lys Gly Gly Val Lys Gly Leu Ala His He Thr 
225 230 235 240 

Gly Gly Gly Phe Thr Asp Asn He Pro Arg Val Phe Pro Asp Gly Leu 
245 250 255 

Gly Ala Val He His Thr Asp Thr Trp Glu Leu Pro Pro Leu Phe Lys 
260 265 270 

Trp He Gin Gin '^hr Gly Arg He Glu Asp Ser Glu Met Arg Arg Thr 
275 280 285 

Phe Asn Leu Gly He Gly Met Val Met Val Val Ser Pro Glu Ala Ala 
290 295 300 

Ser Arg He Leu Glu Glu Val Lys Asn Gly Asp Tyr Val Ala Tyr Arg 
305 310 315 320 

Val Gly Glu Val Val Asn Gly Glu Gly Val Ser Tyr Gin 



325 



330 



<210> 5 
<211> 22 
<212> DNA 

<213> Artificial Sequence 
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<220> . . , 

<223> Description of Artificial Sequence: 
oligonucleotide JG-L 



<400> 5 

gtacctcgag tctagactcg ag 



22 



<210> 6 
<211> 28 
<212> DNA 

<?13> Artificial Sequence 
<220> 

<223> Description of Artificial Sequenc<= 
oligonucleotide AS-1 

<400> 6 

gatcgagctc gttctcttct gtgtcatc 



28 



<210> 7 
<211> 28 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: 
oligonucleotide AS-2 

<400> 7 

gatcccatgg tccccaggta aagacgtc 



28 



<210> 8 
<211> 36 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: 
oligonucleotide slp242 



<400> 8 

cgcggatcct cactactgat agcttacgcc ttcacc 



36 



<210> 9 
<211> 26 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: 
oligonucleotide slp244 



<400> 9 

ttgaagccat ggaagctcgg attttg 



26 



<210> 10 

<211> 37 

<212> DNA 

<213> Artificial Sequence 
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<220> 

<223> Description of Artificial Sequence: 
oligonucleotide slp243 

<400> 10 

cgcatgccat ggataaagat gatgacactg atagtct 
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The present invention provides plant riboflavin biosynthesis genes, including a gene that encodes the subunit of the plant riboflavin 
synthase enzyme complex (lumazine synthase) and a gene that encodes the bifunctional enzyme GTP cyclohydrolase WDHBV synthase. | 
Also disclosed are the recombinant pro<-'iction of these plant riboflavin biosynthesis enzymes in heterologous hosts, screenmg chemicals 
for herbicidal activity using these recombinantly produced enzymes, and the use of thereby identified hcrbicidal chemicals to suppress the 
growth of undesired vegetation. Furthermore, the present invention provides methods for the development of herbicide tolerance in plants, 
plant tissues, plant seeds and plant cells using the riboflavin biosynthesis genes of the invention. 
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RIBOFLAVIN BIOSYNTHESIS GENES FROM PLANTS AND USES THEREOF 

The invention relates generally to enr/^atic activity involved in riboflavin biosynthesis 
in plants. In particular, the invention relates to plant genes that encode the bifunctional 
GTP cyclohydrolase II / DHBP synthase enzyme and the p subunit of the riboflavin 
synthase enzyme complex (lumazine synthase). The invention has various utilities, 
including the recombinant production of these riboflavin biosynthesis enzymes in 
heterologous hosts, the screening of chemicals for herbicidal activity, and the use of 
thereby identified herbicidal chemicals to control the growth of undesired vegetation. The 
invention may also be applied to the development of herbicide tolerance in plants, plant 
tissues, plant seeds, and plant cells. 

I. Riboflavin Biosynthesis 

Riboflavin (vitamin B2 - 6.7-dimethyl-9-(1-D-ribityl)-isoalloxazine) is synthesized by 
all plants and many microorganisms. Riboflavin is essential to basic metabolism because it 
is a precursor to coenzymes such as FAD and FMN, which are required in the enzymatic 
oxidation of carbohydrates. Biosynthesis of riboflavin starts from guanosine-5*-triphosphate 
(GTP) and proceeds through several enzymatic steps, as outlined in Figure 1 of Mironov et 
ai, MoL Gen. Genet 242:201-208 (1994), incorporated herein by reference. 

GTP cyclohydrolase II is the first enzyme of riboflavin biosynthesis, catalyzing the 
synthesis of 2.5-diamino-4-oxy-6-ribosylamino-pyrimidine-5'-phosphate from GTP. DHBP 
synthase catalyzes the conversion of ribulose-5-phosphate to 3,4-dihydroxy-2-butanone 
phosphate (DHBP). In Bacillus, these two enzymatic activities are carried out by a single, 
bifunctional enzyme; in E. co//, however, these two enzymatic activities are carried out by 
two separate enzymes. 

The riboflavin synthase protein is an approximately 1,000,000-Da enzyme complex 
consisting of approximately 60 p subunits and three a subunits. The p subunits form a 
capsid that catalyzes the conversion of 2,4-dioxy-5-amino-6-ribitylamino-pyrimidine (DARP) 
and 3,4-dihydroxy-2-butanone phosphate (DHBP) to 6,7-dimethyl-8-ribityllumazine 
(lumazine); hence, the p subunit is also known as "lumazine synthase". The a subunits, 
contained inside the p subunit capsid. then catalyze the conversion of two units of lumazine 
to one DARP molecule, which is recycled back into the first riboflavin synthase reaction, and 
one riboflavin molecule. 



BNS vage 3 



PCT/EP99/00556 

WO 99/38986 

-2- 



II. Herbicide Discovery 

The use of herbicides to control undesirable vegetation such as weeds in crop fields 
has become almost a universal practice. The herbicide market exceeds 15 billion dollars 
annually. Despite this extensive use. weed control remains a significant and costly problem 
for farmers. 

Effective use of herbicides requires sound management. For instance, the time and 
method of application and stage of weed plant development are critical to getting good 
weed control with herbicides. Since various v-ed species are resistant to herbicides, the 
production of effective new herbicides becomes increasingly important. Novel herbicides 
can now be discovered using high-throughput screens that implement recombinant DNA 
technology. Metabolic enzymes essential to plant growth and development can be 
recombinantly produced though standard molecular biological techniques and utilized as 
herbicide targets in screens for novel inhibitors of the enzymes* activity. The novel 
inhibitors discovered through such screens may then be used as herbicides to control 
undesirable vegetation. 

III. Herbicide Tolerant Plants 

Herbicides that exhibit greater potency, broader weed spectrum, and more rapid 
degradation in soil can also, unfortunately, have greater crop phytotoxicity. One solution 
applied to this problem has been to develop crops that are resistant or tolerant to 
herbicides. Crop hybrids or varieties tolerant to the herbicides allow for the use of the 
herbicides to kill weeds without attendant risk of damage to the crop. Development of 
tolerance can allow application of a herbicide to a crop where its use was previously 
precluded or limited (e.g. to pre-emergence use) due to sensitivity of the crop to the 
herbicide. For example. U.S. Patent No 4.761 .373 to Anderson et a/, is directed to plants 
resistant to various imidazolinone or sulfonamide herbicides. The resistance is conferred by 
an altered acetohydroxyacid synthase (AHAS) enzyme. U.S. Patent No. 4.975,374 to 
Goodman ef a/, relates to plant cells and plants containing a gene encoding a mutant 
glutamine synthetase (GS) resistant to inhibition by herbicides that were known to inhibit 
GS. e.g. phosphinothricin and methionine sulfoximine. U.S. Patent No. 5.013,659 to 
Bedbrook etal. is directed to plants expressing a mutant acetolactate synthase that renders 
the plants resistant to inhibition by sulfonylurea herbicides. U.S. Patent No. 5.162.602 to 
Somers etal. discloses plants tolerant to inhibition by cyclohexanedione and 
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aryloxyphenoxypropanoic acid herbicides. The tolerance is conferred by an altered acetyl 
coenzyme A carboxylase (ACCase). 

DEFINITIONS 

For clarity, certain terms used in the specification are defined and presented as 

follows: 

Activatable DNA Sequence: a DNA sequence that regulates the expression of 
genes in a genome, desirably the genome of a plant. The activatable DNA sequence is 
complementary to a target gene endogenous in the genome. When the activatable DNA 
sequence is introduced and expressed in a cell, it inhibits expression of the target gene. An 
activatable DNA sequence useful in conjunction with the present invention includes those 
encoding or acting as dominant inhibitors, such as a translatable or untranslatable sense 
sequence capable of disrupting gene function in stably transformed plants to positively 
identify one or more genes essential for normal growth and development of a plant. A 
preferred activatable DNA sequence is an antisense DNA sequence. The target gene 
preferably encodes a protein, such as a biosynthetic enzyme, receptor, signal transduction 
protein, structural gene product, or transport protein that is essential to the growth or 
survival of the plant. In an especially preferred embodiment, the target gene encodes 
lumazine synthase or the bifunctional enzyme GTP cyclohydrolase II / DHBP synthase. The 
interaction of the antisense sequence and the target gene results in substantial inhibition of 
the expression of the target gene so as to kill the plant, or at least inhibit nomial plant 
growth or development. 

Activatable DNA Construct: a recombinant DNA construct comprising a synthetic 
promoter operatively linked to the activatable DNA sequence, which when introduced into a 
cell, desirably a plant cell, is not expressed, i.e. is silent, unless a complete hybrid 
transcription factor capable of binding to and activating the synthetic promoter is present. 
The activatable DNA construct is introduced into cells, tissues, or plants to form stable 
transgenic lines capable of expressing the activatable DNA sequence. 

Chimeric: "chimeric" is used to indicate that a DNA sequence, such as a vector or a 
gene, is comprised of more than one DNA sequences of distinct origin which are fused 
together by recombinant DNA techniques resulting in a DNA sequence, which does not 
occur naturally, and which particulariy does not occur in the plant to be transformed. 
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DNA shuffling: DNA shuffling is a method to introduce mutations or 
rearrangements, preferably randomly, in a DNA molecule or to generate exchanges of DNA 
sequences between two or more DNA molecules, preferably randomly. The DNA molecule 
resulting from DNA shuffling is a shuffled DNA molecule that is a non-naturally occurring 
DNA molecule derived from at least one template DNA molecule. The shuffled DNA 
encodes an enzyme modified with respect to the enzyme encoded by the template DNA, 
and preferably has an altered biological activity with respect to the enzyme encoded by the 
template DNA. 

Enzyme activity: means herein the ability of an enzyme to catalyze the conversion 
of a substrate into a product. A substrate for the enzyme comprises the natural substrate of 
the enzyme but also comprises analogues of the natural substrate which can also be 
converted by the enzyme into a product or into an analogue of a product. The activity of the 
enzyme is measured for example by determining the amount of product in the reaction after 
a certain period of time, or by determining the amount of substrate remaining in the reaction 
mixture after a certain period of time. The activity of the enzyme is also measured by 
determining the amount of an unused co-factor of the reaction remaining in the reaction 
mixture after a certain period of time or by determining the amount of used co-factor in the 
reaction mixture after a certain period of time. The activity of the enzyme is also measured 
by determining the amount of a donor of free energy or energy-rich molecule (e.g. ATP. 
phosphoenolpyruvate. acetyl phosphate or phosphocreatine) remaining in the reaction 
mixture after a certain period of time or by determining the amount of a used donor of free 
energy or energy-rich molecule (e.g. ADP, pyruvate, acetate or creatine) in the reaction 

mixture after a certain period of time. 

Expression refers to the transcription and/or translation of an endogenous gene or 

a transgene in plants. In the case of antisense constructs, for example, expression may 

refer to the transcription of the antisense DNA only. 

Gene refers to a coding sequence and associated regulatory sequences wherein the 

coding sequence is transcribed into RNA such as mRNA. rRNA, tRNA, snRNA, sense RNA 

or antisense RNA. Examples of regulatory sequences are promoter sequences, 5' and 3" 

untranslated sequences and 

Herbicide: a chemical substance used to kill or suppress the grovrth of plants, plant 

cells, plant seeds, or plant tissues. 



e ^SOOCID: <WO_9938986A2J.> 



BNS page 6 



wo 99/38986 



PCT/EP99/00556 



-5- 

Heterologous DNA Sequence: a DNA sequence not naturally associated with a 
host cell into which it is introduced, including non-naturally occurring multiple copies of a 
naturally occurring DNA sequence. 

Homologous DNA Sequence: a DNA sequence naturally associated with a host cell 
into which it is introduced. 

Inhibitor: a chemical substance that inactivates the enzymatic activity of a protein 
such as a biosynthetic enzyme, receptor, signal transduction protein, structural gene 
product, or transport protein that is essential to the growth or survival of the plant. In the 
context of the instant invention, an inhibitor is a chemical substance that inactivates the 
enzymatic activity of lumazine synthase or the bifunctional enzyme GTP cyclohydrolase II / 
DHBP synthase from a plant. The term "herbicide" is used herein to define an inhibitor 
when applied to plants, plant cells, plant seeds, or plant tissues. 

Isolated: in the context of the present invention, an isolated DNA molecule or an 
isolated enzyme is a DNA molecule or enzyme, by the hand of man, exists apart from its 
native environment and is therefore not a product of nature. An isolated DNA molecule or 
enzyme may exist in a purified form or may exist in a non-native environment such as. for 
example, a transgenic host cell. 

Minimal Promoter: promoter elements, particularly a TATA element, that are 
inactive or that have greatly reduced promoter activity in the absence of upstream 
activation. In the presence of a suitable transcrption factor, the minimal promoter functions 
to permit transcription. 

Modified Enzyme Activity: enzyme activity different from that which naturally occurs 
in a plant (i.e. enzyme activity that occurs naturally in the absence of direct or indirect 
manipulation of such activity by man), which is tolerant to inhibitors that inhibit the naturally 
occurring enzyme activity. 

Plant refers lo any plant, particularly to seed plants 

Plant cell: structural and physiological unit of the plant, comprising a protoplast and 
a cell wall. The plant cell may be in form of an isolated single cell or a cultured cell, or as a 
part of higher organized unit such as, for example, a plant tissue, or a plant organ. 

Recombinant DNA: molecule a combination of DNA sequences that are joined 
together using recombinant DNA technology 
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Recombinant DNA technology: procedures used to join together DNA sequences 
as described, for example, in Sambrook et al., 1989, Cold Spring Harbor, NY: Cold Spring 
Harbor Laboratory Press 

Significant Increase: an increase in enzymatic activity that is larger than the margin 
of error inherent in the measurement technique, preferably an increase by about 2-fold or 
greater of the activity of the wild-type enzyme in the presence of the inhibitor, more 
preferably an increase by about 5-fold or greater, and most preferably an increase by about 
10-fold or greater. 

Significantly less: means that the amount of a product of an enzymatic reaction is 
larger than the margin of error inherent in the measurement technique, preferably a 
decrease by about 2-fold or greater of the activity of the wild-type enzyme in the absence of 
the inhibitor, more preferably an decrease by about 5-fold or greater, and most preferably 
an decrease by about 10-fold or greater. 

Substantially Similar: in the context of the present invention, a DNA molecule that 
has at least 60 percent sequence identity with the portion of SEQ ID N0:1 that codes for 
lumazine synthase, i.e. that portion of SEQ ID N0:1 that encodes the amino acid sequence 
of SEQ ID N0:2; or a DNA molecule that has at least 60 percent sequence identity with the 
portion of SEQ ID NO: 13 that codes for the bifunctional GTP cyclohydrolase II / DHBP 
synthase enzyme from a plant, i.e. that portion of SEQ ID N0;13 that encodes the amino 
acid sequence of SEQ ID NO: 14. A substantially similar lumazine synthase nucleotide 
sequence hybridizes specifically to SEQ ID N0:1 or fragments thereof under the following 
conditions: hybridization at 7% sodium dodecyl sulfate (SDS), 0.5 M NaP04 pH 7.0, 1 mM 
EDTA at 50**C; wash with 2X SSC, 1% SDS, at 50°C. A substantially similar plant GTP 
cyclohydrolase II / DHBP synthase nucleotide sequence hybridizes specifically to SEQ ID 
N0:13 or fragments thereof under the above conditions. With respect to proteins, 
"substantially similar" as used herein means a protein sequence that is at least 90% 
identical to either the amino acid sequence set forth in SEQ ID N0:2 or the amino acid 
sequence set forth in SEQ ID N0:14. 

Substrate: a substrate is the molecule that the enzyme naturally recognizes and 
converts to a product in the biochemical pathway in which the enzyme naturally carries out 
its function, or is a modified version of the molecule, which is also recognized by the 
enzyme and is converted by the enzyme to a product in an enzymatic reaction similar to the 
naturally-occurring reaction. 
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Synthettc refers to a nucleotide sequence comprising structural characters that are 
not present in the natural sequence. For example, an artificial sequence that resembles 
more closely the G+C content and the normal codon distribution of dicot and/or monocot 
genes is said to be synthetic. 

Tolerance: the ability to continue normal growth or function when exposed to an 
inhibitor or herbicide. 

Transformation: a process for introducing heterologous DNA into a cell, tissue, or 
plant. Transformed cells, tissues, or plants are understood to encompass not only the end 
product of a transformation process, but also transgenic progeny thereof. 

Transgenic: stably transformed with a recombinant DNA molecule that preferably 
comprises a suitable promoter operatively linked to a DNA sequence of interest. 



In view of the above, one object of the invention is to provide methods for identifying 
new or improved herbicides. Another object of the invention is to provide methods for using 
such new or improved herbicides to suppress the growth of plants such as weeds. Still 
another object of the invention is to provide improved crop plants that are tolerant to such 
new or improved herbicides. 

In furtherance of these and other objects, the present invention provides a DNA 
molecule comprising a nucleotide sequence isolated from a plant that encodes an enzyme 
involved in riboflavin biosynthesis, wherein the enzyme has either lumazine synthase 
activity or bifunctional GTP cyclohydrolase II / DHBP synthase activity. 

According to one embodiment, the present invention provides a DNA molecule 
comprising a nucleotide sequence isolated from a plant that encodes the (J subunit of 
riboflavin synthase (»"mazine synthase). For example, the DNA molecule of the invention 
may comprises a nucleotide sequence that encodes an enzyme having lumazine synthase 
activity, wherein the enzyme comprises an amino acid sequence substantially similar to the 
amino acid sequence set forth in SEQ ID N0:2. In another example, the DNA molecule of 
the invention comprises a nucleotide sequence that encodes an enzyme having lumazine 
synthase activity, wherein the enzyme comprises the amino acid sequence set forth in SEQ 
ID N0:2. In an other example, the DNA molecule of the invention comprises a nucleotide 
sequence isolated from a plant that encodes an enzyme having lumazine synthase activity, 
wherein said DNA molecule hybridizes to a DNA molecule that encodes the amino acid 
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sequence set forth in SEQ ID NO:2 under the following conditions: hybridization at 7% 
sodium dodecyl sulfate (SDS). 0.5 M NaP04 pH 7.0, 1 mM EDTA at 50°C; wash with 2X 
SSC. 1% SDS. at SO'^C. The invention further provides a DNA molecule comprising a 
nucleotide sequence isolated from a plant that encodes an enzyme involved in riboflavin 
biosynthesis, wherein the enzyme has lumazine synthase activity, wherein said DNA 
molecule hybridizes to the coding sequence set forth in SEQ ID N0:1 under the following 
conditions: hybridization at 7% sodium dodecyl sulfate (SDS), 0.5 M NaP04 pH 7.0, 1 mM 
EDTA at 50°C; wash with 2X SSC. 1% SDS, at 50°C. In yet another example, the DNA 
molecule of the invention comprises a nucleotide sequence that is substantially similar to 
the coding sequence set forth in SEQ ID N0:1 and that encodes an enzyme having 
lumazine synthase activity. In a further example, the DNA molecule of the invention 
comprises a nucleotide sequence Isolated from a plant that encodes an enzyme having 
lumazine synthase activity, wherein said DNA molecule comprises a 20 base pair nucleotide 
portion identical in sequence to a consecutive 20 base pair portion of the coding sequence 
set forth in SEQ ID N0:1 . In still another example, the DNA molecule of the invention 
comprises the coding sequence set forth in SEQ ID NO:1 and encodes an enzyme having 
lumazine synthase activity. Although the nucleotide sequence provided in SEQ ID N0:1 
that encodes lumazine synthase was isolated from Arabidopsis thaliana, using the 
information provided by the present invention, the nucleotide sequence that encodes an 
enzyme having lumazine synthase activity can be obtained from any plant using standard 
methods known in the art. 

According to another embodiment, the present invention provides a DNA molecule 
comprising a nucleotide sequence isolated from a plant that encodes the bifunctional GTP 
cyclohydrolase it / DHBP synthase. For example, the DNA molecule of the invention may 
comprise a nucleotide sequence that encodes an enzyme having bifunctional GTP 
cyclohydrolase II / DHBP synthase activity, wherein the enzyme comprises an amino acid 
sequence substantially similar to the amino acid sequence set forth in SEQ ID N0:14. In 
another example, the DNA molecule of the invention comprises a nucleotide sequence that 
encodes an enzyme having bifunctional GTP cyclohydrolase II / DHBP synthase activity, 
wherein the enzyme comprises the amino acid sequence set forth in SEQ ID N0:14. 
In another example of the invention the DNA molecule comprises a nucleotide sequence 
isolated from a plant that encodes an enzyme having bifunctional GTP cyclohydrolase II / 
DHBP synthase activity, wherein said DNA molecule hybridizes to a DNA molecule that 
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encodes the amino acid sequence set forth in SEQ ID NO: 14 under the following 
conditions: hybridization at 7% sodium dodecyl sulfate (SDS), 0.5 M NaP04 pH 7.0, 1 mM 
EDTA at SO^'C; wash with 2X SSC. 1% SDS, at SOX. The invention further provides a DNA 
molecule comprising a nucleotide sequence isolated from a plant that encodes an enzyme 
involved in riboflavin biosynthesis, wherein the enzyme has bifunctional GTP 
cyclohydrolase II / DHBP synthase activity, wherein said DNA molecule hybridizes to the 
coding sequence set forth in SEQ ID NO:13 under the following conditions: hybridization at 
7% sodium dodecyl sulfate (SDS), 0.5 M NaP04 pH 7.0. 1 mM EDTA at 50X; wash with 2X 
SSC, 1%SDS, at 50**C. 

In yet another example, the DNA molecule of the invention comprises a nucleotide 
sequence that is substantially similar to the coding sequence set forth in SEQ ID N0:13 and 
that encodes an enzyme having bifunctional GTP cyclohydrolase II / DHBP synthase 
activity. In a further example, the DNA molecule of the invention comprises a nucleotide 
sequence isolated from a plant that encodes an enzyme having bifunctional GTP 
cyclohydrolase II / DHBP synthase activity, wherein said DNA molecule comprises a 20 
base pair nucleotide portion identical in sequence to a consecutive 20 base pair portion of 
the coding sequence set forth in SEQ ID N0:13. In still another example, the DNA 
molecule of the invention comprises the coding sequence set forth in SEQ ID NO: 13 and 
encodes an enzyme having bifunctional GTP cyclohydrolase II / DHBP synthase activity. 
Although the nucleotide sequence provided in SEQ ID N0:13 that encodes the bifunctional 
GTP cyclohydrolase II / DHBP synthase was isolated from Arabidopsis thaliana, using the 
information provided by the present invention, the nucleotide sequence that encodes an 
enzyme having bifunctional GTP cyclohydrolase II / DHBP synthase activity can be obtained 
from any plant using standard methods known in the art. 

The present invention also provides a chimeric gene comprising a promoter 
operatively linked to a DNA molecule of the invention. Further, the present invention 
provides a recombinant vector comprising such a chimeric gene, wherein the vector is 
capable of being stably transformed into a host cell. Still further, the present invention 
provides a host cell comprising such a vector, wherein the host cell is capable of expressing 
the DNA molecule encoding an enzyme involved in riboflavin biosynthesis. A host cell 
according to the invention may be a bacterial cell, a yeast cell, or a plant cell. Especially the 
host cell according to the invention may be a bacterial cell. 
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The present invention further provides a process for producing nucleotides sequences 
encoding gene products having altered lumazine synthase activity comprising: (a) shuffling 
a DNA molecule from a plant that encodes an enzyme having lumazine synthase acitivity, 
(b) expressing the resulting shuffled nucleotide sequences, and (c) selecting for altered 
lumazine synthase activity as compared to the activity of an enzyme encoded by the 
unshuffled DNA molecule. Preferably, the nucleotide sequence shuffled according to this 
method is SEQ ID NO: 1 . The invention is also directed to a shuffled DNA molecule 
obtainable by this process. Preferably, the shuffled DNA molecule encodes an enzyme 
having enhanced tolerance to an inhibitor of lumazine synthase activity. The present 
invention also provides a chimeric gene comprising a promoter operatively linked to a 
shuffled DNA molecule; a recombinant vector comprising said chimeric gene, wherein said 
vector is capable of being stably transformed into a host cell; a host cell comprising said 
vector. Said host cell is preferably a bacterial cell, a yeast cell, or a plant cell, especially a 
plant cell. The invention is also directed to a plant or seed comprising such a plant cell. 
Preferably, said plant is tolerant to an inhibitor of lumazine synthase activity. 

The present invention further provides a process for producing nucleotides sequences 
encoding gene products having altered bifunctional GTP cyclohydrolase II / DHBP synthase 
activity comprising: (a) shuffling a DNA molecule from a plant that encodes an enzyme 
having bifunctional GTP cyclohydrolase II / DHBP synthase acitivity. (b) expressing the 
resulting shuffled nucleotide sequences, and (c) selecting for altered bifunctional GTP 
cyclohydrolase II / DriBP synthase activity as compared to the activity of an enzyme 
encoded by the unshuffled DNA molecule. Preferably, the nucleotide sequence shuffled 
according to this method is SEQ ID NO: 13. The invention is also directed to a shuffled 
DNA molecule obtainable by this process. Preferably, the shuffled DNA moiecule encodes 
an enzyme having enhanced tolerance to an inhibitor of bifunctional GTP cyclohydrolase II / 
DHBP synthase activity. The present invention also provides a chimeric gei.a comprising a 
promoter operatively linked to a shuffled DNA molecule; a recombinant vector comprising 
said chimeric gene, wherein said vector is capable of being stably transformed into a host 
cell; a host cell comprising said vector. Said host cell is preferably a bacterial cell, a yeast 
cell, or a plant cell, especially a plant cell. The invention is also directed to a plant or seed 
comprising such a plant cell. Preferably, said plant is tolerant to an inhibitor of bifunctional 
GTP cyclohydrolase II / DHBP synthase activity. 

In accordance with another embodiment, the present invention also relates to the 
recombinant production of the above-described riboflavin biosynthesis enzymes and 
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methods of use thereof. In particular, the present invention provides an isolated plant 
enzyme involved in riboflavin biosynthesis, wherein the enzyme has lumazine synthase 
activity. Preferably this enzyme comprises an amino acid sequence substantially similar to 
the amino acid sequence set forth in SEQ ID N0:2. More preferably, this enzyme 
comprises the amino acid sequence set forth in SEQ ID N0:2. The present invention also 
provides an isolated plant enzyme involved in riboflavin biosynthesis, wherein the enzyme 
has bifunctional GTP cyclohydrolase II / DHBP synthase activity. Preferably, this enzyme 
comprises an amino acid sequence substantially similar to the amino acid sequence set 
forth in SEQ ID N0:14. More preferably, this enzyme comprises the amino acid sequence 
set forth in SEQ ID NO:14. 

The present invention further provides methods of using purified plant riboflavin 
biosynthesis enzymes such as lumazine synthase and bifunctional GTP cyclohydrolase II / 
DHBP synthase to screen for novel inhibitors thereof, which can then be used as herbicides 
to suppress the growth of undesirable vegetation in fields where crops are grown, 
particularly agronomicaliy important crops such as maize and other cereal crops such as 
wheat, oats, rye, sorghum, rice, bariey, millet, turf and forage grasses, and the like, as well 
as cotton, sugar cane, sugar beet, oilseed rape, and soybeans. 

With regard to lumazine synthase, such a screen for chemicals having the ability to inhibit 
lumazine synthase activity preferably comprises the steps of: (a) combining an enzyme 
having lumazine synthase activity in a first reaction mixture with 2,4-dioxy-5-amino-6- 
ribitylamino-pyrimidine and 3,4-dihydroxy-2-butanone phosphate under conditions in which 
the enzyme is capable of catalyzing the synthesis of lumazine; (b) combining the chemical 
and the enzyme in a second reaction mixture with 2,4-dioxy-5-amino-6-ribitylamino- 
pyrimidine and 3,4-dihydroxy-2-butanone phosphate under the same conditions as in the 
first reaction mixture; (c) determining the amounts of lumazine produced in the first and 
second reaction mixtures; and (d) comparing the amounts of lumazine produced in the first 
and second reaction mixtures; wherein the chemical is capable of inhibiting the lumazine 
synthase activity of the enzyme if the amount of lumazine produced in the second reaction 
mixture is significantly less than the amount of lumazine produced in the first reaction 
mixture. Preferred is a method for screening according to the invention wherein the first 
reaction mixture comprises SO^iM 2,4-dioxy-5-amino-6-ribitylamino-pyrimidine, and 0.5 mM 
3.4-dihydroxy-2-butanone phosphate. Further preferred is a method for screening according 
to the invention, wherein the amounts of lumazine produced in the reaction mixtures are 
determined using a fluorimeter at an excitation wavelength of 407 nm. 
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With regard to the bifunctional GTP cyclohydrolase II / DHBP synthase, such a 
screen for chemicals having the ability to inHihit GTP cyclohydrolase II / DHBP synthase 
synthase activity preferably comprises the steps of: (a) combining an enzyme having GTP 
cyclohydrolase II / DHBP synthase activity in a first reaction mixture with GTP or nbulose-5- 
phosphate under conditions in which the enzyme is capable of catalyzing the synthesis of 
2 5-diamino-4-oxy-6-ribosylamino-pyrimidine-5--phosphate or 3.4-dihydroxy-2-butanone 
phosphate, respectively: (b) combining the chemical and the enzyme in a second reaction 
mixture with GTP or ribulose-5-phosphate ur.aar the same conditions as in the first reaction 
mixture; (c) determining the amounts of 2.5-diamino-4-oxy-6-ribosylamino-pyrimidine-5'- 
phosphate or 3.4-dihydroxy-2-butanone phospnate produced in the first and second 
reaction mixtures: and (d) comparing the amounts of 2.5-diamino-4-oxy-6-ribosyiamino- 
pyrimidine-5'-phosphate or 3.4-dihydroxy-2-butanone phosphate produced in the first and 
second reaction mixtures; wherein the chemical is capable of inhibiting the bifunctional GTP 
cyclohydrolase II / DHBP synthase activity of the enzyme if the amount of 2.5-diamino^- 
oxy-6-ribosylamino-pyrimidine-5'-phosphate or 3,4-dihydroxy-2-butanone phosphate 

produced in the second reaction mixture is significantly less than the amount of 2.5- 
diamino-4-oxy-6-ribosylamino-pyrimidine-5'-phosphateor3.4-dihydroxy-2-butanone 

phosphate produced in the first reaction mixture. 

The present invention also embodies herbicidal chemicals identified by the above 
screening methods in addition to methods for suppressing the growth of plants by applying 
such herbicidal chemicals to the plants, whereby the chemicals inhibit the activiiy of 
lumazine synthase or bifunctional GTP cyclohydrolase II / DHBP synthase in the plants. 

The present invention further embodies plants, plant tissues, plant seeds, and plant 
cells that have modified riboflavin biosynthesis enzyme activity and that are therefore 
tolerant to inhibition by a herbicide at levels normally inhibitory to naturally occurring 
riboflavin biosynthesis enzyme activity. Herbicide tolerant plants encompassed by the 
invention include those that would otherwise be potential targets for normally inhibiting 
herbicides, particularly the agronomically important crops mentioned above. According to 
this embodiment, plants, plant tissue, plant seeds, or plant cells are stably transfomied with 
a recombinant DNA molecule comprising a suitable promoter functional in plants operatively 
linked to a nucleotide coding sequence that encodes a modified riboflavin biosynthesis 
enzyme that is tolerant to inhibition by a herbicide at a concentration that would nomially 
inhibit the activity of wild-type, unmodified riboflavin biosynthesis enzyme. Modified 
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riboflavin biosynthesis enzyme activity may also be conferred upon a plant by increasing 
expression of wild-type herbicide-sensitive riboflavin biosynthesis enzyme by providing 
multiple copies of wild-type riboflavin biosynthesis genes to the plant or by overexpression 
of wild-type riboflavin biosynthesis genes under control of a stronger-than-wild-type 
promoter. The transgenic plants, plant tissue, plant seeds, or plant cells thus created are 
then selected by conventional selection techniques, whereby herbicide tolerant lines are 
isolated, characterized, and developed. Alternately, random or site-specific mutagenesis 
may be used to generate herbicide tolerant lines. 

Theretore, the present invention provides a plant, plant cell, plant seed, or plant tissue 
comprising a DNA molecule comprising a nucleotide sequence isolated from a plant that 
encodes an enzyme involved in riboflavin biosynthesis, wherein the enzyme has lumazine 
synthase activity and wherein the DNA molecule confers upon the plant, plant cell, plant 
seed, or plant tissue tolerance to a herbicide in amounts that normally naturally occurring 
lumazine synthase activity. According to one example of this embodiment, the enzyme 
comprises an amino acid sequence substantially similar to the amino acid sequence set 
forth in SEQ ID N0:2. According to another example of this embodiment, the DNA 
molecule is substantially similar to the coding sequence set forth in SEQ ID N0:1. In a 
related aspect, the present invention is directed to a method for selectively suppressing the 
growth of weeds in a field containing a crop of planted crop seeds or plants, comprising the 
steps of: (a) planting herbicide tolerant crops or crop seeds, which are plants or plant seeds 
that are tolerant to a herbicide that inhibits naturally occurring lumazine synthase activity; 
and (b) applying to the crops or crop seeds and the weeds in the field a herbicide in 
amounts that inhibit naturally occurring lumazine synthase activity, wherein the herbicide 
suppresses the growth of the weeds without significantly suppressing the growth of the 
crops. 

The present invention further provides a plant, plant cell, plant seed, or plant tissue 
comprising a DNA molecule comprising a nucleotide sequence isolated from a plant that 
encodes an enzyme involved in riboflavin biosynthesis, wherein the enzyme has 
bifunctional GTP cyclohydrolase II / DHBP synthase activity and wherein the DNA molecule 
confers upon the plant, plant cell, plant seed, or plant tissue tolerance to a herbicide in 
amounts that normally naturally occurring bifunctional GTP cyclohydrolase II / DHBP 
synthase activity. According to one example of this embodiment, the enzyme comprises an 
amino acid sequence substantially similar to the amino acid sequence set forth in SEQ ID 
N0:14. According to another example of this embodiment, the DNA molecule is 
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substantially similar to the coding sequence set forth in SEQ ID N0;13. In a related aspect, 
the present invention is directed to a method for selectively suppressing the growth of 
weeds in a field containing a crop of planted crop seeds or plants, comprising the steps of: 
(a) planting herbicide tolerant crops or crop seeds, which are plants or plant seeds that are 
tolerant to a herbicide that inhibits naturally occurring bifunctional GTP cyclohydrolase II / 
DHBP synthase activity; and (b) applying to the crops or crop seeds and the weeds in the 
field a herbicide in amounts that inhibit naturally occurring bifunctional GTP cyclohydrolase 
II / DHBP synthase activity, wherein the herbicide suppresses the growth of the weeds 
without significantly suppressing the growth of the crops. 

Other objects and advantages of the present invention will become apparent to those 
skilled in the art from a study of the following description of the invention and non-limiting 
examples. 

1. Plant Riboflavin Biosynthesis Genes 

In one aspect, the present invention is directed to a DNA molecule comprising a 
nucleotide sequence isolated from a plant source that encodes the p subunit of riboflavin 
synthase (lumazine synthase). In particular, the present invention provides a DNA molecule 
isolated from Arabidopsis thaliana that encodes lumazine synthase and DNA molecules 
substantially similar thereto that encode enzymes having lumazine synthase activity. The 
DNA coding sequence for lumazine synthase from Arabidopsis thaliana is provided in SEQ 
ID NO:1. 

In another aspect, the present invention is directed to a DNA molecule comprising a 
nucleotide sequence isolated from a plant source that encodes the bifunctional enzyme 
GTP cyclohydrolase II / 3.4-dihydroxy-2-butanone phosphate (DHBP). In particular, the 
present invention provides a DNA molecule isolated from Arabidopsis thaliana that encodes 
this bifunctional enzyme and DNA molecules substantially similar thereto that encode 
enzymes having GTP cyclohydrolase II / DHBP synthase activity. The DNA coding 
sequence for GTP cyclohydrolase II / DHBP synthase from Arabidopsis thaliana is provided 
in SEQ ID N0:13. The present invention represents the first recognition that in plants. GTP 
cyclohydrolase II and DHBP synthase constitute a single, bifunctional enzyme 

Based on Applicants' disclosure of the present invention, DNA sequences encoding 
riboflavin biosynthesis enzymes can, for the first time, be isolated from the genome of any 
desired plant species. An exemplary method for isolating riboflavin biosynthesis genes from 
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plants is described in Examples 1 and 1 1 . With this method, searches of the Arabidopsis 
thaliana Expressed Sequence Tag (EST) database (Arabidopsis Biological Resource 
Center at Ohio State, Ohio State University, Columbus. OH) revealed ESTs with 
homologies to the Eco// riboflavin synthase p subunit and S. subtilis GTP cyclohydrolase. 
DNA fragments generated by PGR with primers specific to these ESTs were used to probe 
an Arabidopsis lambda ZAP library, whereupon cDNAs were isolated. The determined 
protein sequence encoded by one cDNA showed approximately 68% similarity to both the 
E. CO// and B. subtilis riboflavin synthase p subunit. The determined protein sequence 
encoded by another cDNA showed approximately 70% similarity to the a subtilis GTP 
cyclohydrolase. 

Alternatively, riboflavin biosyntb-^'s gene sequences can be isolated from any plant 
according to well known techniques based on their sequence similarity to the Arabidopsis 
thaliana coding sequences (SEQ ID N0s:1 and 13) taught by the present invention. In 
these techniques, all or part of a known plant riboflavin biosynthesis gene's coding 
sequence is used as a probe that selectively hybridizes to other riboflavin biosynthesis gene 
sequences present in a population of cloned genomic DNA fragments or cDNA fragments 
(i.e. genomic or cDNA libraries) from a chosen plant. Such techniques include hybridization 
screening of plated DNA libraries (either plaques or colonies; see, e.g.. Sambrook ef a/., 
"Molecular Cloning", eds.. Cold Spring Harbor Laboratory Press. (1989)) and amplification 
by PCR using oligonudsotide primers corresponding to sequence domains consen/ed 
among known riboflavin biosynthesis enzyme's amino acid sequences (see. e.g. Innis ef a/.. 
"PCR Protocols, a Guide to Methods and Applications", pub. by Academic Press (1990)). 
These methods are particularly well suited to the isolation of riboflavin biosyntf.asis gene 
sequences from organisms closely related to the organism from which the probe sequence 
is derived. Thus, application of these methods using the Arabidopsis coding sequences as 
probes would be expected to be particularly well suited for the isolation of riboflavin 
biosynthesis gene sequences from other plant species, including monocotyledons and 
dicotyledons. 

The isolated riboflavin biosynthesis gene sequences taught by the present invention 
can be manipulated according to standard genetic engineering techniques to suit any 
desired purpose. For example, an entire plant riboflavin biosynthesis gene sequence or 
portions thereof may be used as a probe capable of specifically hybridizing to coding 
sequences and messenger RNAs. To achieve specific hybridization under a variety of 
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conditions, such probes include sequences that are unique among plant riboflavin 
biosynthesis gene sequences and are at least 10 nucleotides in length, preferably at least 
20 nucleotides in length, and most preferably at least 50 nucleotides in length. Such 
probes may be used to amplify and analyze riboflavin biosynthesis gene sequences from a 
chosen organism via PGR. This technique may be useful to isolate additional riboflavin 
biosynthesis gene sequences from a desired organism or as a diagnostic assay to 
determine the presence of riboflavin biosynthesis gene sequences in an organism. This 
technique may also be used to detect the presence of altered riboflavin biosynthesis gene 
sequences associated with a particular condition of interest such as herbicide tolerance, 
poor health, etc. 

Lumazine synthase-specific and GTP cyclohydrolase II / DHBP synthase-specific 
hybridization probes can also be used to map the location of these native genes in the 
genome of a chosen plant using standard techniques based on the selective hybridization 
of the probe to genomic sequences. These techniques include, but are not limited to, 
identification of DNA polymorphisms identified or contained within the probe sequence, and 
use of such polymorphisms to follow segregation of the gene relative to other markers of 
known map position in a mapping population derived from self fertilization of a hybrid of two 
polymorphic parental lines (see e.g. Helentjaris etaL, Plant Mol. Biol. 5: 109 (1985); 
Sommer etaL Biotechniques 12:B2 (1992); D'Ovidio ef a/.. Plant Mol. BioL 75; 169 (1990)). 
While any plant riboflavin biosynthesis gene sequence is contemplated to be useful as a 
probe for mapping riboflavin biosynthesis genes, preferred probes are those gene 
sequences from plant species more closely related to the chosen plant species, and most 
preferred probes are those gene sequences from the chosen plant species. Mapping of 
riboflavin biosynthesis genes in this manner is contemplated to be particularly useful for 
breeding purposes. For instance, by knowl.ig the genetic map position of a mutant 
riboflavin biosynthesis gene that confers herbicide resistance, flanking DNA markers can be 
identified from a reference genetic map (see, e.g., Helentjaris, Trends Genet. 3:217 
(1987)). During introgression of the herbicide resistance trait into a new breeding line, 
these markers can then be used to monitor the extent of linked flanking chromosomal DNA 
still present in the recurrent parent after each round of back-crossing. 

Lumazine synthase-specific and GTP cyclohydrolase II / DHBP synthase-specific 
hybridization probes can also be used to quantify levels of riboflavin biosynthesis gene 
mRNA in a plant using standard techniques such as Northern blot analysis. This technique 
is useful as a diagnostic assay to detect altered levels of riboflavin biosynthesis gene 
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expression that are associated with particular conditions such as enhanced tolerance to 
herbicides that target riboflavin biosynthesis genes. 

II. Essentiality of Riboflavin Biosynthesis Genes in Plants Demonstrated by Antisense 
Inhibition 

As shown in the examples below, the essentiality of riboflavin biosynthesis genes for 
normal plant growth and development has been demonstrated by antisense inhibition of 
lumazine synthase in plants using the antisense validation system described in co-owned 
and co-pending application serial no. 08/978,bvi0 [entitled "f^ethods and Compositions 
Useful for the Activation of Silent Transgenes", filed Nov. 26, 1 997], incorporated herein by 
reference. In this system, a hybrid transcription lactor gene is made that comprises a DNA- 
binding domain and an activation domain. In addition, an activatable DNA construct is 
made that comprises a synthetic promoter operatively linked to an activatable DNA 
sequence. The hybrid transcription factor gene and synthetic promoter are selected or 
designed such that the DNA binding domain of the hybrid transcription factor is capable of 
binding specifically to the synthetic promoter, which then activates expression of the 
activatable DNA sequence. A first plant is transformed with the hybrid transcription factor 
gene, and a second plant is transfonned with the activatable DNA constaict. The first plant 
and second plants are crossed to produce a progeny plant containing both the sequence 
encoding the hybrid transcription factor and the synthetic promoter, wherein the activatable 
DNA sequence is expressed in the progeny plant. In the preferred embodiment, the 
activatable DNA sequence Is an antisense sequence capable of inactivating expression of 
an endogenous gene such as the lumazine synthase gene or the bif unctional GTP 
cyclohydrolase II / DHBP synthase gene. Hence, the progeny plant will be unable to 
normally express the endogenous gene. 

This antisense validation system is especially useful for allowing expression of traits 
that might othenwise be unrecoverable as constitutively driven transgenes. For instance, 
foreign genes with potentially lethal effect or antisense genes or dominant-negative 
mutations designed to abolish function of essential genes, while of great interest in basic 
studies of plant biology, present Inherent experimental problems. Decreased transfomnation 
frequencies are often cited as evidence of lethality associated with a particular constitutively 
driven transgene, but negative results of this type are laden with alternative trivial 
explanations. The present invention is an important advancement in the field of agriculture 
because it allows stable maintenance and propagation of a test transgene separate from its 
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expression. This ability to separate transgene insertion from expression is especially useful 
for firm conclusions about essentiality of gene function to be drawn. A substantial benefit of 
the present invention is that plant genes essential for normal growth or development can 
thus be identified in this manner. The identification of such genes provide useful targets for 
screening compound libraries for effective herbicides. Below, the antisense validation 
system is described in greater detail: 

A. Hybrid Transcription Factor Gene 

A hybrid transcription factor gene for use in the antisense validation system 
described herein comprises DNA sequences encoding (1) a DNA-binding domain and (2) an 
activation domain that interacts with components of transcriptional machinery assembling at 
a promoter. Gene fragments are joined, typically such that the DNA binding domain is 
toward the 5' terminus and the activator domain is toward the 3' terminus, to form a hybrid 
gene whose expression produces a hybrid transcription factor. One skilled in the art is 
capable of routinely combining various DNA sequences encoding DNA binding domains 
with various DNA sequences encoding activation domains to produce a wide array of hybrid 
transcription factor genes. Examples of DNA sequences encoding DNA binding domains 
include, but are not limited to, those encoding the DNA binding domains of GAL4, 
bacteriophage 434, lexA. lad, and phage lambda repressor. Examples of DNA sequences 
encoding the activation domain include, but are not limited to, those encoding the acidic 
activation domains of herpes simplex VP1 6. maize 01 , and P1 . In addition, suitable 
activation domains can be isolated by fusing DNA pieces from an organism of choice to a 
suitable DNA binding domain and selecting directly for function (Estruch etal., (1994) 
Nucleic Acids Res. 22: 3983-3989). Domains of transcriptional activator proteins can be 
swapped between proteins of diverse origin (Brent and Ptashne (1985) Ce//43: 729-736). 
A preferable hybrid transcription factor gene comprises DNA sequences encoding the GAL4 
DNA binding domain fused to the maize 01 activation domain. 

B. Activatable DNA Construct 

An activatable DNA construct for use in the antisense validation system described 
herein comprises (1) a synthetic promoter operatively linked to (2) an activatable DNA 
sequence. The synthetic promoter comprises at least one DNA binding site recognized by 
the DNA binding domain of the hybrid transcription factor, and a minimal promoter, 
preferably a TATA element derived from a promoter recognized by plant cells. More 
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particularly the TATA element is derived from a promoter recognized by the plant cell type 
into which the synthetic promoter will be incorporated. Desirably, the DNA binding site is 
repeated multiple times in the synthetic promoter so that the minimal promoter may be more 
effectively activated, such that the activatable DNA sequence associated with the synthetic 
promoter is more effectively expressed. One skilled in the art can use routine molecular 
biology and recombinant DNA technology to make desirable synthetic promoters. 
Examples of DNA binding sites that can be used to make synthetic promoters useful in the 
invention include, but are not limited to, the upstream activating sequence (UASg) 
recognized by the GAL4 DNA binding protein, the /ac operator, and the lexA binding site. 
Examples of promoter TATA elements recognized by plant cells include those derived from 
CaMV 35S. the maize Bzl promoter, and the UBQ3 promoter. An especially preferable 
synthetic promoter comprises a truncated CaMV 35S sequence containing the TATA 
element (nucleotides -59 to +48 relative to the start of transcription), fused at its 5' end to 
approximately 10 concatemeric direct repeats of the upstream activating sequence (UASg) 
recognized by the GAL4 DNA binding domain. 

The activatable DNA sequence encompasses any DNA sequence for which stable 
introduction and expression in a plant cell is desired. Particularly desirable activatable DNA 
sequences are sense or antisense sequences, whose expression results in decreased 
expression of their endogenous counterpart genes, thereby inhibiting normal plant growth or 
development. The activatable DNA sequence is operatively linked to the synthetic promoter 
to form the activatable DNA construct. The activatable DNA sequence in the activatable 
DNA construct is not expressed, i.e. is silent, in transgenic lines, unless a hybrid 
transcription factor capable of binding to and activating the synthetic promoter, is also 
present. The activatable DNA construct subsequently is introduced into cells, tissues or 
plants to fomi stable transgenic lines expressing the activatable DNA sequence, -^s 
described more fully below. In the context of the present invention, the activatabie DNA 
sequence preferably comprises an antisense lumazine synthase sequence or an antisense 
bifunctional GTP cyclohydrolase II / DHBP synthase sequence. 

C. Transgenic Plants Containing the Hybrid Transcription Factor Gene or 
the Activatable DNA Construct 

The antisense validation system described herein utilizes a first plant containing the 
hybrid transcription factor gene and a second plant containing the activatable DNA 
construct. The hybrid transcription factor genes and activatable DNA constructs described 
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above are introduced into the plants by methods well known and routinely used in the art, 
including but not limited to crossing, Agrobacterium-med\z{e6 transformation. Ti plasmid 
vectors, direct DNA uptake such as microprojectile bombardment, liposome mediated 
uptake, micro-injection, etc. Transformants are screened for the presence and functionality 
of the transgenes according to standard methods known to those skilled in the art. 

D. Transgenic Plants Containing Both the Hybrid Transcription Factor Gene and 
the Activatable DNA Construct 

F1 plants containing both the hybrid transcription factor gene and the activatable 
DNA construct are generated by cross-pollination and selected for the presence of an 
appropriate marker. In contrast to plants containing the activatable DNA construct alone, 
the F1 plants generate high levels of activatable DNA sequence expression product, 
comparable to those obtained with strong constitutive promoters such as CaMV 35S. 



Antisense Validation Assay: 

Thus, a useful assay in the system described herein comprises the following steps: 

a) providing a first transgenic plant stably transformed with a hybrid transcription factor gene 
encoding a hybrid transcription factor capable of activating a synthetic promoter when said 
synthetic promoter is present in the plant, wherein the first transgenic plant is homozygous 
for the hybrid transcription factor; 

b) providing a second transgenic plant stably transformed with an activatable DNA construct 
comprising a synthetic promoter activatable by the hybrid transcription factor of step a) 
operatively linked to an activatable DNA sequence, such as an antisense lumazine 
synthase sequence or an antisense QTP cyclohydrolase II / DHBP synthase sequence; 

c) crossing the first transgenic plant with the second transgenic plant to yield F1 plants 
expressing the activatable DNA sequence in the presence of the hybrid transcription factor; 
and 

d) determining the effect of expression of the activatable DNA sequence on the F1 plants. 

III. Recombinant Production of Plant Riboflavin Biosynthesis Enzymes and Uses Thereof 

For recombinant production of a plant riboflavin biosynthesis enzyme in a host 
organism, a plant riboflavin biosynthesis coding sequence of the invention may be inserted 
into an expression cassette designed for the chosen host and introduced into the host 
where it is recombinantly produced. The choice of specific regulatory sequences such as 
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promoter, signal sequence. 5' and 3' untranslated sequences, and enhancer appropriate for 
the chosen host is within the level of skill of the routineer in the art. The resultant molecule, 
containing the individual elements linked in proper reading frame, may be inserted into a 
vector capable of being transformed into the host cell. Suitable expression vectors and 
methods for recombinant production of proteins are well known for host organisms such as 
E. coli, yeast, and insect cells (see. e.g., Luckow and Summers. Bio/Technol. 6:47 (1988)). 
Specific examples include plasmids such as pBluescript (Stratagene. La Jolla, CA). pFLAG 
(Intemational Biotechnologies, Inc.. New Haven, CT). pTrcHis (Invitrogen, La Jolla. CA). and 
baculovirus expression vectors, e.g.. those derived from the genome of Autographica 
califomica nuclear polyhedrosis virus (AcMNPV). A preferred baculovirus/insect system is 
pVI11392/Sf21 cells (Invitrogen, La Jolla, CA). 

Recombinantly produced plant riboflavin biosynthesis enzymes can be isolated and 
purified using a variety of standard techniques. The actual techniques that may be used will 
vary depending upon the host organism used, whether the enzyme is designed for 
secretion, and other such factors familiar to the skilled artisan (see, e.g. chapter 16 of 
Ausubel. F. et ai, "Cun-ent Protocols in Molecular Biology", pub. by John Wiley & Sons. Inc. 
(1994). 

Recombinantly produced plant riboflavin biosynthesis enzymes are useful for a 
variety of purposes. For example, they can be used in in vitro assays to screen known 
herbicidal chemicals whose target has not been identified to detemiine if they inhibit 
riboflavin biosynthesis enzymes. Such in vitro assays may also be used as more general 
screens to identify chemicals that inhibit such enzymatic activity and that are therefore 
herbicide candidates. Alternatively, recombinantly produced riboflavin biosynthesis 
enzymes may be used to further characterize their association with known inhibitors in order 
to rationally design new inhibitory herbicides as well as herbicide tolerant fomns of the 
enzymes. 

Inhibitor Assay: 

Thus, an assay useful for identifying inhibitors of essential plant genes, such as plant 
riboflavin biosynthesis genes, comprises the steps of: 

a) reacting a plant riboflavin biosynthesis enzyme and a substrate thereof in the presence 
of a suspected inhibitor of the enzyme's function; 



BNStXXID- <W0 9938986A2J_> 



BNS cage 23 



wo 99/38986 



PCT/EP99/0OSS6 



-22- 

b) comparing the rate of enzymatic activity in the presence of the suspected inhibitor to the 
rate of enzymatic activity under the same conditions in the absence of the suspected 
inhibitor; and 

c) determining whether the suspected inhibitor inhibits the riboflavin biosynthesis enzyme. 

For example, the inhibitory effect on plant lumazine synthase may detemiined by a 
reduction or complete inhibition of lumazine synthesis in the assay. Such a detennination 
may be made by comparing, in the presence and absence of the candidate inhibitor, the 
amount of lumazine synthesized in the in vitro assay using fluorescence or absorbance 
detection as described infra in the Examples. A similar assay may be used to screen for 
inhibitors of the bifunctional plant GTP cyclohydrolase II / DHBP synthase enzyme. 

In addition, recombinantly produced plant riboflavin biosynthesis enzymes may be 
used to elucidate the complex stmcture of these molecules, such as has been done for 
riboflavin synthase from Bacillus subtilis (Ladenstein, et al.. (1988) J. Mol. Biol. 203, 1045- 
1070). Such information regarding the structure of the plant riboflavin biosynthesis 
enzymes may be used, for example, in the rational design of new inhibitory herbicides. 

IV. Herbicide Tolerant Plants 

The present invention is further directed to plants, plant tissue, plant seeds, and plant 
cells tolerant to herbicides that inhibit the naturally occurring riboflavin biosynthesis in these 
plants, wherein the tolerance is conferred by altered riboflavin biosynthesis enzyme activity. 
Altered riboflavin biosynthesis enzyme activity may be conferred upon a plant according to 
the invention by increasing expression of wild-type heriDicide-sensitive riboflavin 
biosynthesis enzyme by providing additional wild-type riboflavin biosynthesis genes to the 
plant, by expressing modified herbicide-tolerant riboflavin biosynthesis enzymes in the 
plant, or by a combination of these techniques. Representative plants include any plants to 
which these herbicides are applied for their normally intended purpose. Prefen-ed are 
agronomically important crops such as cotton, soybean, oilseed rape, sugar beet, maize, 
rice, wheat, barley, oats, rye, sorghum, millet, turf, forage, turf grasses, and the like. 

A. Increased Expression of Wild-Type Riboflavin Biosynthesis Enzymes 
Achieving altered riboflavin biosynthesis enzyme activity through increased 
expression results in a level of a riboflavin biosynthesis enzyme in the plant cell at least 
sufficient to overcome growth inhibition caused by the herbicide. The level of expressed 
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enzyme generally is at least two times, preferably at least five times, and more preferably at 
least ten times the natively expressed amount. Increased expression may be due to 
multiple copies of a wild-type riboflavin biosynthesis gene; multiple occurrences of the 
coding sequence within the gene (/. e. gene amplification) or a mutation in the non-coding, 
regulatory sequence of the endogenous gene in the plant cell. Plants having such altered 
gene activity can be obtained by direct selection in plants by methods known in the art (see, 
e.g. U.S. Patent No. 5,162.602. and U.S. Patent No. 4,761,373, and references cited 
therein). These plants also may be obtained by genetic engineering techniques known in 
the art. Increased expression of a herbicide-sensitive riboflavin biosynthesis gene can also 
be accomplished by stably transforming a plant cell with a recombinant or chimeric DNA 
molecule comprising a promoter capable of driving expression of an associated structural 
gene in a plant cell operatively linked to a homologous or heterologous structural gene 
encoding the riboflavin biosynthesis enzyme. 

B, Expression of Modified Herbicide-Tolerant Riboflavin Biosynthesis Enzymes 
According to this embodiment, plants, plant tissue, plant seeds, or plant cells are 
stably transformed with a recombinant DNA molecule comprising a suitable promoter 
functional in plants operatively linked to a coding sequence encoding a herbicide tolerant 
form of a riboflavin biosynthesis enzyme. A herbicide tolerant form of the enzyme has at 
least one amino acid substitution, addition or deletion that confers tolerance to a herbicide 
that inhibits the unmodified, naturally occurring form of the enzyme. The transgenic plants, 
plant tissue, plant seeds, or plant cells thus created are then selected by conventional 
selection techniques, whereby heriDlcide tolerant lines are isolated, characterized, and 
developed. Below are described methods for obtaining genes that encode herbicide 
tolerant forms of ribof:<AVin biosynthesis enzymes: 

One general strategy involves direct or indirect mutagenesis procedures on 
microbes. For instance, a genetically manipulatable microbe such as E. coli or S. cerevisiae 
may be subjected to random mutagenesis in vivo with mutagens such as UV light or ethyl or 
methyl methane sulfonate. Mutagenesis procedures are described, for example, in Miller, 
Experiments in Molecular Genetics, Cold Spring Harbor Laboratory, Cold Spring Harbor. NY 
(1972); Davis efa/., Advanced Bacterial Genetics, Cold Spring HariDor Laboratory. Cold 
Spring Harbor, NY (1980); Sherman etaL, Methods in Yeast Genetics, Cold Spring Harbor 
Laboratory. Cold Spring Harbor, NY (1983); and U.S. Patent No. 4.975,374. The microbe 
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selected for mutagenesis contains a normal, inhibitor-sensitive riboflavin biosynthesis gene 
and is dependent upon the activity conferred by this gene. The mutagenized cells are 
grown in the presence of the inhibitor at concentrations that inhibit the unmodified gene. 
Colonies of the mutagenized microbe that grow better than the unmutagenized microbe in 
the presence of the inhibitor (i.e. exhibit resistance to the inhibitor) are selected for further 
analysis. Riboflavin biosynthesis genes from these colonies are isolated, either by cloning 
or by PGR amplification, and their sequences are elucidated. Sequences encoding altered 
gene products are then cloned back into the microbe to confirm their ability to confer 
inhibitor tolerance. 

A method of obtaining mutant herbicide-tolerant alleles of a plant riboflavin 
biosynthesis gene involves direct selection in plants. For example, the effect of a 
mutagenized riboflavin biosynthesis gene on the growth inhibition of plants such as 
Arabidopsis, soybean, or maize is determined by plating seeds sterilized by art-recognized 
methods on plates on a simple minimal salts medium containing increasing concentrations 
of the inhibitor. Such concentrations are in the range of 0.001 , 0.003, 0.01 , 0.03, 0.1 , 0.3, 
1, 3, 10. 30, 110, 300. 1000 and 3000 parts per million (ppm). The lowest dose at which 
significant growth inhibition can be reproducibly detected is used for subsequent 
experiments. 

Mutagenesis of plant material is utilized to increase the frequency at which resistant 
alleles occur in the selected population. Mutagenized seed material is derived from a 
variety of sources, including chemical or physical mutagenesis or seeds, or chemical or 
physical mutagenesis or pollen (Neuffer, In Maize for Biological Research Sheridan, ed. 
Univ. Press, Grand Forks. ND., pp. 61-64 (1982)). which is then used to fertilize plants and 
the resulting Mi mutant seeds collected. Typically for Arabidopsis, M2 seeds (Lehle Seeds, 
Tucson, AZ), which are pr^neny seeds of plants grown from seeds mutagenized with 
chemicals, such as ethyl methane sulfonate, or with physical agents, such as gamma rays 
or fast neutrons, are plated at densities of up to 10,000 seeds/plate (10 cm diameter) on 
minimal salts medium containing an appropriate concentration of inhibitor to select for 
tolerance. Seedlings that continue to grow and remain green 7-21 days after plating are 
transplanted to soil and grown to maturity and seed set. Progeny of these seeds are tested 
for tolerance to the herbicide. If the tolerance trait is dominant, plants whose seed 
segregate 3:1 / resistant:sensitive are presumed to have been heterozygous for the 
resistance at the M2 generation. Plants that give rise to all resistant seed are presumed to 
have been homozygous for the resistance at the M2 generation. Such mutagenesis on 
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intact seeds and screening of their M2 progeny seed can also be carried out on other 
species, for instance soybean (see, e.g. U.S. Pat. No. 5,084.082). Alternatively, mutant 
seeds to be screened for herbicide tolerance are obtained as a result of fertilization with 
pollen mutagenized by chemical or physical means. 

Confirmation that the genetic basis of the herbicide tolerance is a modified riboflavin 
biosynthesis gene is ascertained as exemplified below. First, alleles of the riboflavin 
biosynthesis gene from plants exhibiting resistance to the inhibitor are isolated using PGR 
with primers based either upon conserved regions in the Arabidopsis cDNA coding 
sequences shown in SEQ ID NO:1 or SEQ ID ^^0:13 or, more preferably, based upon the 
unaltered riboflavin biosynthesis gene sequence from the plant used to generate tolerant 
alleles. After sequencing the alleles to determine the presence of mutations in the coding 
sequence, the alleles are tested for their ability to confer tolerance to the inhibitor on plants 
into which the putative tolerance-conferring alleles have been transformed. These plants 
can be either Arabidopsis plants or any other plant whose growth is susceptible to the 
inhibitors. Second, the riboflavin biosynthesis genes are mapped relative to known 
restriction fragment length polymorphisms (RFLPs) (See. for example, Chang et aL Proc. 
NatL Acad, Sci, USA 85: 6856-6860 (1 988); Nam et aL. Plant Cell 1 : 699-705 (1 989). The 
tolerance trait is independently mapped using the same markers. When tolerance is due to 
a mutation in that riboflavin biosynthesis gene, the tolerance trait maps to a position 
indistinguishable from the position of the riboflavin biosynthesis gene. 

Another method of obtaining herbicide-tolerant alleles of a riboflavin biosynthesis 
gene is by selection in plant cell cultures. Explants of plant tissue, e.g. embryos, leaf disks, 
etc. or actively growing callus or suspension cultures of a plant of interest are grown on 
medium in the presence of increasing concentrations of the inhibitory herbicide or an 
analogous inhibitor suitable for use in a laboratory environment. Varying degrees of growth 
are recorded in different cultures. In certain cultures, fast-growing variant colonies arise 
that continue to grow even in the presence of normally inhibitory concentrations of inhibitor. 
The frequency with which such faster-growing variants occur can be increased by treatment 
with a chemical or physical mutagen before exposing the tissues or cells to the inhibitor. 
Putative tolerance-conferring alleles of the riboflavin biosynthesis gene are isolated and 
tested as described in the foregoing paragraphs. Those alleles identified as conferring 
herbicide tolerance may then be engineered for optimal expression and transformed into 
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the plant. Alternatively, plants can be regenerated from the tissue or cell cultures 
containing these alleles. 

Still another method involves mutagenesis of wild-type, herbicide sensitive plant 
riboflavin biosynthesis genes in bacteria or yeast, followed by culturing the microbe on 
medium that contains inhibitory concentrations of the inhibitor and then selecting those 
colonies that grow in the presence of the inhibitor. More specifically, a plant cDNA, such as 
the Arabidopsis cDNA encoding lumazine synthase (SEQ ID N0:1 ) or the bifunctional GTP 
cyclohydrolase II / DHBP synthase enzyme (SEQ ID NO: 13) is cloned into a microbe that 
othenwise lacks the selected gene's activity. The transformed microbe is then subjected to 
in vivo mutagenesis or to in vitro mutagenesis by any of several chemical or enzymatic 
methods known in the art. e.g. sodium bisulfite (Shortle et at., Methods Enzymol. 
700:457-468 (1983); methoxylamine (Kadonaga etal.. Nucleic Acids Res. r3;1 733-1 745 
(1 985); oligonucleotide-directed saturation mutagenesis (Hutchinson et al.. Proc. Natl. 
Acad. Sci. USA, 83:710-714 (1986); or various polymerase misincorporation strategies (see. 
e.g. Shortle et al.. Proc. Natl. Acad. Sci. USA, 79:1588-1592 (1982); Shiraishi etal.. Gene 
64.313-319 (1988); and Leung etal.. Technique 7:11-15 (1989). Colonies that grow in the 
presence of normally inhibitory concentrations of inhibitor are picked and purified by 
repeated restreaking. Their plasmids are purified and tested for the ability to confer 
tolerance to the inhibitor by retransforming them into the microbe lacking riboflavin 
biosynthesis gene activity. The DNA sequences of cDNA inserts from plasmids that pass 
this test are then determined. 

Herbicide resistant riboflavin biosynthesis genes are also obtained using methods 
involving in vitro recombination, also called DNA shuffling. By DNA shuffling, mutations, 
preferably random mutations, are introduced in riboflavin biosynthesis genes. DNA shuffling 
also leads to the recombination and rearrangement of sequences within a riboflavin 
biosynthesis gene or to recombination and exchange of sequences between two or more 
different riboflavin biosynthesis protein encoding sequences. These methods allow for the 
production of millions of mutated riboflavin biosynthesis genes. The mutated genes, or 
shuffled genes, are screened for desirable properties, e.g. improved tolerance to herbicides 
and for mutations that provide broad spectrum tolerance to the different classes of inhibitor 
chemistry. Such screens are well within the skills of a routineer in the art. 

In a preferred embodiment, a mutagenized riboflavin biosynthesis gene is fomied from 
at least one template riboflavin biosynthesis gene, wherein the template riboflavin 
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biosynthesis gene has been cleaved into double-stranded random fragments of a desired 
size, and comprising the steps of adding to the resultant population of double-stranded 
random fragments one or more single or double-stranded oligonucleotides, wherein said 
oligonucleotides comprise an area of identity and an area of heterology to the double- 
stranded random fragments; denaturing the resultant mixture of double-stranded random 
fragments and oligonucleotides into single-stranded fragments; incubating the resultant 
population of single-stranded fragments with a polymerase under conditions which result in 
the annealing of said single-stranded fragments at said areas of identity to form pairs of 
annealed fragments, said areas of identity being sufficient for one member of a pair to 
prime replication of the other, thereby forming a mutagenized double-stranded 
polynucleotide; and repeating the second and third steps for at least two further cycles, 
wherein the resultant mixture in the second step of a further cycle includes the mutagenized 
double-stranded polynucleotide from the third step of the previous cycle, and the further 
cycle forms a further mutagenized double-stranded polynucleotide, wherein the 
mutagenized polynucleotide is a mutated riboflavin biosynthesis gene having enhanced 
tolerance to a herbicide which inhibits naturally occurring riboflavin biosynthesis activity. In a 
preferred embodiment, the concentration of a single species of double-stranded random 
fragment in the population of double-stranded random fragments is less than 1% by weight 
of the total DNA. In a further preferred embodiment, the template double-stranded 
polynucleotide comprises at least about 100 species of polynucleotides. In another 
preferred embodiment, the size of the double-stranded random fragments is from about 5 
bp to 5 kb. In a further preferred embodiment, the fourth step of the method comprises 
repeating the second and the third steps for at least 10 cycles. Such method is described 
e.g. in Stemmer et al. (1994) Nature 370: 389-391, in US Patent 5,605,793 and in Crameri 
et al. (1998) Nature 391: 288-291. as well as in WO 97/20078. and these references are 
incorporated herein by reference. 

In another preferred embodiment, any combination of two or more different riboflavin 
biosynthesis genes are mutagenized in vitro by a staggered extension process (StEP), as 
described e.g. in Zhao et al. (1998) Nature Biotechnology 16: 258-261. Briefly, the two or 
more riboflavin biosynthesis genes are used as template for PGR amplification with the 
extension cycles of the PGR reaction preferably carried out at a lower temperature than the 
optimal polymerization temperature of the polymerase. For example, when a thermostable 
polymerase with an optimal temperature of approximately 72**G is used, the temperature for 
the extension reaction is desirably below 72**G. more desirably below es^'C, preferably 
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below 60°C, more preferably the temperature for the extension reaction is 55»C. 
Additionally, the duration of the extension reaction of the PGR cycles is desirably shorter 
than usually carried out in the art. more des,..u,y it is less than 30 seconds, preferably .t .s 
less than 15 seconds, more preferably the duration of the extension reaction is 5 seconds, 
only a short DNA fragment is polymerized in each extension reaction, allowing template 
switch of the extension products between the starting DNA molecules after each cycle of 
denaturation and annealing, thereby generating diversity among the extension products. 
The optimal number of cycles in the PGR reaction depends on the length of the riboflavin 
biosynthesis coding regions to be mutagenized but desirably over 40 cycles, more desirably 
over 60 cycles, preferably over 80 cycles are used. Optimal extension conditions and the 
optimal number of PGR cycles for every combination of riboflavin biosynthesis genes are 
detemiined as described in using procedures well-known in the art. The other parameters 
for the PGR reaction are essentially the same as commonly used in the art. The primers for 
the amplification reaction are preferably designed to anneal to DNA sequences located 
outside of the coding sequence of the riboflavin biosynthesis genes, e.g. to DNA sequences 
of a vector comprising the riboflavin biosynthesis genes, whereby the different riboflavin 
biosynthesis genes used in the PGR reaction are preferably comprised in separate vectors. 
The primers desirably anneal to sequences located less than 500 bp away from riboflavin 
biosynthesis coding sequences, preferably less than 200 bp away from the riboflavin 
biosynthesis coding sequences, more preferably less than 120 bp away from the riboflavin 
biosynthesis coding sequences. Preferably, the riboflavin biosynthesis coding sequences 
are surrounded by restriction sites, which are included in the DNA sequence amplified 
during the PGR reaction, thereby facilitating the cloning of the amplified products into a 
suitable vector. 

In another preferred embodiment, fragments of riboflavin biosynthesis gP-s having 
cohesive ends are produced as described in WO 98/05765. The cohesive ends are 
produced by ligating a first oligonucleotide corresponding to a part of a riboflavin 
biosynthesis gene to a second oligonucleotide not present in the gene or corresponding to 
a part of the gene not adjoining to the part of the gene corresponding to the first 
oligonucleotide, wherein the second oligonucleotide contains at least one ribonucleotide. A 
double-stranded DNA is produced using the first oligonucleotide as template and the 
second oligonucleotide as primer. The ribonucleotide is cleaved and removed. The 
nucleotide(s) located 5' to the ribonucleotide is also removed, resulting in double-stranded 
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fragments having cohesive ends. Such fragments are randomly reassembled by ligation to 
obtain novel combinations of gene sequences. 

Any riboflavin biosynthesis gene or any combination of riboflavin biosynthesis genes 
is used for in vitro recombination in the context of the present invention, for example, a 
riboflavin biosynthesis gene derived from a plant, such as, e.g. Arabidopsis thaliana, e.g. a 
riboflavin biosynthesis gene set forth in SEQ ID N0:1 or SEQ ID NO: 13, or a riboflavin 
biosynthesis gene from Bacillus orE. coli. Whole riboflavin biosynthesis genes or portions 
thereof are used in the context of the present invention. The library of mutated riboflavin 
biosynthesis genes obtained by the methods described above are cloned into appropriate 
expression vectors and the resulting vectors are transformed into an appropriate host, for 
example an algae like Chlamydomonas, a yeast or a bacteria. A preferred host is 
preferably a host that othenA/ise lacks riboflavin biosynthesis gene activity. Host cells 
transformed with the vectors comprising the library of mutated riboflavin biosynthesis genes 
are cultured on medium that contains inhibitory concentrations of the inhibitor and those 
colonies that grow in the presence of the inhibitor are selected. Colonies that grow in the 
presence of normally inhibitory concentrations of inhibitor are picked and purified by 
repeated restreaking. Their plasmids are purified and the DNA sequences of cDNA inserts 
from plasmids that pass this test are then determined. 

An assay for identifying a modified riboflavin biosynthesis gene that is tolerant to an 
inhibitor may be performed in the same manner as the assay to identify inhibitors of the 
riboflavin biosynthesis enzyme (Inhibitor Assay, above) with the following modifications: 
First, a mutant riboflavin biosynthesis enzyme is substituted in one of the reaction mixtures 
for the wild-type riboflavin biosynthesis enzyme of the inhibitor assay. Second, an inhibitor 
of wild-type enzyme is present in both reaction mixtures. Third, mutated activity (activity in 
the presence of inhibitor and .nutated enzyme) and unmutated activity (activity in the 
presence of inhibitor and wild-type enzyme) are compared to determine whether a 
significant increase in enzymatic activity is observed in the mutated activity when compared 
to the unmutated activity, f^utated activity is any measure of activity of the mutated enzyme 
while in the presence of a suitable substrate and the inhibitor. Unmutated activity is any 
measure of activity of the wild-type enzyme while in the presence of a suitable substrate 
and the inhibitor. A significant increase is defined as an increase in ef.zymatic activity that 
is larger than the margin of error inherent in the measurement technique, preferably an 
increase by about 2-fold or greater of the activity of the wild-type enzyme in the presence of 
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the inhibitor, more preferably an increase by about 5-fold or greater, most preferably an 
increase by about 10-fold or greater. 

In addition to being used to create herbicide-tolerant plants, genes encoding 
herbicide tolerant riboflavin biosynthesis enzymes can also be used as selectable markers 
in plant cell transformation methods. For example, plants, plant tissue, plant seeds, or plant 
cells transformed with a transgene can also be transformed with a gene encoding an 
altered riboflavin biosynthesis enzyme capable of being expressed by the plant. The 
transformed cells are transferred to medium containing an inhibitor of the enzyme in an 
amount sufficient to inhibit the sun/ivability of plant cells not expressing the modified gene 
wherein only the transformed cells will survive. The method is applicable to any plant cell 
capable of being transformed with a modified riboflavin biosynthesis enzyme-encoding 
gene, and can be used with any transgene of interest. Expression of the transgene and the 
modified gene can be driven by the same promoter functional in plant cells, or by separate 
promoters. 

V. Plant Transformation Technology 

A wild-type or herbicide-tolerant form of the riboflavin biosynthesis gene can be 
incorporated in plant or bacterial cells using conventional recombinant DNA technology. 
Generally, this involves inserting a DNA molecule encoding the riboflavin biosynthesis 
enzyme into an expression system to which the DNA molecule is heterologous (i.e., not 
normally present) using standard cloning procedures known in the art. The vector contains 
the necessary elements for the transcription and translation of the inserted protein-coding 
sequences in a host cell containing the vector. A large number of vector systems known in 
the art can be used, such as piasmids, bacteriophage viruses and other modified viruses. 
The components of the expression system may also be modified to increase expression. 
For example, tmncated sequences, nucleotide substitutions or other modifications may be 
employed. Expression systems known in the art can be used to transform virtually any crop 
plant cell under suitable conditions. Transformed cells can be regenerated into whole 
plants such that the chosen form of the riboflavin biosynthesis gene confers herbicide 
tolerance in the transgenic plants. 

A. Requirements for Construction of Plant Expression Cassettes 
Gene sequences intended for expression in transgenic plants are first assembled in 
expression cassettes behind a suitable promoter expressible in plants. The expression 
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cassettes may also comprise any futher sequences required or selected for the expression 
of the transgene. Such sequences include, but are not restricted to. transcription 
terminators, extraneous sequences to enhance expression such as introns, vital sequences, 
and sequences intended for the targeting of the gene product to specific organelles and cell 
compartments. These expression cassettes can then be easily transferred to the plant 
transformation vectors described infra. The following is a description of various 
components of typical expression cassettes. 

1. Promoters 

The selection of the promoter used in expression cassettes will detemriine the spatial 
and temporal expression pattern of the transgene in the transgenic plant. Selected 
promoters will express transgenes in specific cell types (such as leaf epidermal cells, 
mesophyll cells, root cortex cells) or in specific tissues or organs (roots, leaves or flowers, 
for example) and the selection will reflect the desired location of accumulation of the gene 
product. Altematively, the selected promoter may drive expression of the gene under 
various inducing conditions. Promoters vary in their strength, i.e., ability to promote 
transcription. Depending upon the host cell system utilized, any one of a number of suitable 
promoters known in the art can be used. For example, for constitutive expression, the 
CaMV 35S promoter, the rice actin promoter, or the ubiquitin promoter may be used. For 
regulatable expression, the chemically inducible PR-1 promoter from tobacco or Arabidopsis 
may be used {see, e.g., U.S. Patent No. 5,689.044). 

2. Transcriptional Terminators 

A variety of transcripuonal terminators are available for use in expression cassettes. 
These are responsible for the termination of transcription beyond the transgene and its 
correct polyadenylation. Appropriate transcriptional terminators are those that are known to 
function in plants and include the CaMV 35S terminator, the fm/temiinator, the nopaline 
synthase terminator and the pea rbcS E9 terminator. These can be used in both 
monocotyledons and dicotyledons. 

3. Sequences for the Enhancement or Regulation of Expression 

Numerous sequences have been found to enhance gene expression from within the 
transcriptional unit and these sequences can be used in conjunction with the genes of this 
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invention to increase their expression in transgenic plants. For example, various .ntron 
sequences such as introns of the maize AdhI gene have been shown to enhance 
expression, particularly in monocotyledonous cells. In addition, a number of non-translated 
leader sequences derived from viruses are also known to enhance expression, and these 
are particularly effective in dicotyledonous cells. 

4 Coding Sequence Optimization 

The coding sequence of the selected gene may be genetically engineered by altering 
the coding sequence for optimal expression in the crop species of interest. Methods for 
modifying coding sequences to achieve optimal expression in a particular crop species are 
well known (see. e.g. Perlak etal.. Proc. Natl. Acad. Sci. USA 88: 3324 (1991); and Koziel 
et al., Bio/technol. H: 1 94 (1 993)). 

5 Targeting of the Gene Product Within the Cell 
Various mechanisms for targeting gene products are known to exist in plants and the 
sequences controlling the functioning of these mechanisms have been characterized in 
some detail. For example, the targeting of gene products to the chloroplast is controlled by 
a signal sequence found at the amino terminal end of various proteins which is cleaved 
during chloroplast import to yield the mature protein (e.g. Comai et al. J. Biol. Chem. 263: 
15104-15109 (1988)). Other gene products are localized to other organelles such as the 
mitochondrion and the peroxisome (e.g. Unger et al. Plant Molec. Biol. 13: 41 1-418 (1989)). 
The cDNAs encoding these products can also be manipulated to effect the targetmg of 
heterologous gene nroducts to these organelles. In addition, sequences have been 
characterized which cause the targeting of gene products to other cell compartments. 
Amino terminal sequences are responsible for targeting to the ER. the apoplast, and 
extracellular secretion from aleurone cells (Koehler & Ho. Plant Cell 2: 769-783 (1990)). 
Additionally, amino terminal sequences in conjunction with carboxy terminal sequences are 
responsible for vacuolar targeting of gene products (Shinshi et al. Plant Molec. Biol. 14: 
357-368 (1990)). By the fusion of the appropriate targeting sequences described above to 
transgene sequences of interest it is possible to direct the transgene product to any 
organelle or cell compartment. 

B. Construction of Plant Transfomaation Vectors 
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Numerous transformation vectors available for plant transformation are known to 
those of ordinary skill in the plant transformation arts, and the genes pertinent to this 
invention can be used in conjunction with any c^^h vectors. The selection of vector will 
depend upon the preferred transformation technique and the target species for 
transformation. For certain target species, different antibiotic or herbicide selection markers 
may be preferred. Selection markers used routinely in transformation include the nptll 
gene, which confers resistance to kanamycin and related antibiotics (Messing & Vierra. 
Gene 19: 259-268 (1982); Bevan et a!., Nature 304:184-187 (1983)). the dar gene, which 
confers resistance to the herbicide phosphinothricin (V'hite et al.. Nucl. Acids Res 18: 1062 
(1990). Spencer et al. Theor. Appl. Genet 79- 625-631 (1990)). the hph gene, which confers 
resistance to the antibiotic hygromycin (Blochinger & Diggelmann. Mol Cell Biol 4: 2929- 
2931). and the dhfrgene. which confers resistance to methatrexate (Bourouis et al., EMBO 
J. giZl: 1099-1 104 (1983)), and the EPSPS gene, which confers resistance to glyphosate 
(U.S. Patent Nos. 4.940,935 and 5.188.642). 

1 . Vectors Suitable for Agrobacterium Transfomiation 

Many vectors are available for transformation using Agrobacterium tumefaciens. 
These typically carry at least one T-DNA border sequence and include vectors such as 
pBIN19 (Bevan. Nucl. Acids Res. (1984)) and pXYZ. Typical vectors suitable for 
Agrobacterium transformation include the binary vectors pCIB200 and pCIB2001 . as well as 
the binary vector pCIBIO and hygromycin selection derivatives thereof. (See. for example. 
U.S. Patent No. 5.639.949). 

2. Vectors Suitable tor r\on-Agrobacterium Transfomnation 
Transformation without the use of Agrobacterium tumefaciens circumvents the 

requirement for T-DNA sequences in the chosen transfomnation vector and consequently 
vectors lacking these sequences can be utilized in addition to vectors such as the ones 
described above which contain T-DNA sequences. Transformation techniques that do not 
rely on Agrobacterium include transformation via particle bombardment, protoplast uptake 
{e.g. PEG and electroporation) and microinjection. The choice of vector depends largely on 
the preferred selection for the species being transformed. Typical vectors suitable for non- 
Agrobacterium transformation include pCIB3064. pS0G19. and pSOG35. (See. for 
example. U.S. Patent No. 5.639.949). 
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C. Transformation Techniques 
Once the coding sequence of interest has been cloned into an expression system, it is 
transformed into a plant cell. Methods for transformation and regeneration of plants are well 
known in the art. For example, Ti plasmid vectors have been utilized for the delivery of 
foreign DNA. as well as direct DNA uptake, liposomes, electroporation, micro-injection, and 
microprojectiles. In addition, bacteria from the genus Agrobacterium can be utilized to 

transform plant cells. 

Transformation techniques for dicotyledons are well known in the art and include 
/\grobacfer;um-based techniques and techniques that do not require Agrobacterium. Non- 
Agrobacterium techniques involve the uptake of exogenous genetic material directly by 
protoplasts or cells. This can be accomplished by PEG or electroporation mediated uptake, 
particle bombardment-mediated delivery, or microinjection. In each case the transformed 
cells are regenerated to whole plants using standard techniques known in the art. 

Transformation of most monocotyledon species has now also become routine. 
Preferred techniques include direct gene transfer into protoplasts using PEG or 
electroporation techniques, particle bombardment into callus tissue, as well as 
>^Sirobacfent;m-mediated transformation. 



VI. Breeding 

The wild-type or altered form of a riboflavin biosynthesis gene of the present invention 
can be utilized to confer hert)icide tolerance to a wide variety of plant cells, including those 
of gymnosperms. monocots, and dicots. Although the gene can be inserted into any plant 
cell falling within these broad classes, it is particularly useful in crop plant cells, such as rice, 
wheat, barley, rye. ccrn. potato, carrot, sweet potato, sugar beet, bean, pea, chicory, 
lettuce, cabbage, cauliflower, broccoli, turnip, radish, spinach, asparagus, onion, garlic, 
eggplant, pepper, celery, carrot, squash, pumpkin, zucchini, cucumber, apple, pear, quince, 
melon, plum, cherry, peach, nectarine, apricot, strawberry, grape, raspberry, blackberry, 
pineapple, avocado, papaya, mango, banana, soybean, tobacco, tomato, sorghum and 
sugarcane. 

The high-level expression of a wild-type riboflavin biosynthesis gene and/or the 
expression of herbicide-tolerant forms of a riboflavin biosynthesis gene conferring hertDicide 
tolerance in plants, in combination with other characteristics important for production and 
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quality, can be incorporated into plant lines through breeding approaches and techniques 
known in the art. 

Where a herbicide tolerant riboflavin biosynthesis gene allele is obtained by direct 
selection in a crop plant or plant cell culture from which a crop plant can be regenerated, it 
is moved into commercial varieties using traditional breeding techniques to develop a 
herbicide tolerant crop without the need for genetically engineering the allele and 
transforming it into the plant. 

The invention will be further described by reference to the following detailed 
examples. These examples are provided for purposes of illustration only, and are not 
intended to be limiting unless othenA/ise specified. 
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BRIEF DESCRIPTION OF THE SEQUENCES IN THE SEQUENCE LISTING 

SEQ ID N0:1 is a cDNA sequence encoding the p subunit of riboflavin synthase 
(lumazine synthase) from Arabidopsis thaliana. 

SEQ ID N0:2 is the predicted amino acid sequence of Arabidopsis thaliana lumazine 
synthase encoded by SEQ ID N0:1. 

SEQ ID N0:3 is oligonucleotide DG-63. 

SEQ ID NO:4 is oligonucleotide DG-65. 

SEQ ID N0:5 is oligonucleotide JG-L. 

SEQ ID N0:6 is oligonucleotide RS-1. 

SEQ ID N0:7 is oligonucleotide RS-2, 

SEQ ID NO:8 is a synthetic peptide used in Example 7. 

SEQ ID N0:9 is a another synthetic peptide used in Example 7. 

SEQ ID NO: 10 is oligonucleotide DG-252. 

SEQ ID N0:1 1 is oligonucleotide DG-253. 

SEQ ID N0:12 is oligonucleotide DG-254. 

SEQ ID NO:13 is a partial cDNA sequence encoding the bifunctional GTP 
cyclohydrolase II / DHBP synthase enzyme from Arabidopsis thaliana. 

SEQ ID N0:14 the predicted amino acid sequence of the mature Arabidopsis 
thaliana GTP cyclohydrolase II / DHBP synthase enzyme encoded by SEQ ID NO: 13. 

SEQ ID N0:15 is oligonucleotide DG-67. 

SEQ ID N0:16 is oligonucleotide DG-69. 

SEQ ID N0:17 is oligonucleotide DG-392a. 

SEQ ID N0:18 is oligonucleotide DG-o93a. 

SEQ ID N0:19 is oligonucleotide DG-390a. 

SEQ ID NO:20 is oligonucleotide DG-391a. 
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EXAMPLES 

Standard recombinant DNA and molecular cloning techniques used here are well 
known in the art and are described by Sambrook. etal., Molecular Cloning, eds., Cold 
Spring Harbor Laboratory Press, Cold Spring Harbor, NY (1989) and by T.J. Silhavy, M.L. 
Berman, and LW. Pnguist. Experiments with Gene Fusions . Cold Spring Harbor 
Laboratory, Cold Spring Harbor, NY (1 984) and by Ausubel, P.M. et ai. Current Protocols in 
Molecular BioloQV . pub. by Greene Publishing Assoc. and Wiley-lnterscience (1987). 

Example 1 : Isolation of a cDNA Encoding Lumazine Synthase from Arabidopsis 

A search of the Arabidopsis thaliana Expressed Sequence Tag (EST) database 
(Arabidopsis Biological Resource Center at Ohio State, Ohio State University, Columbus, 
OH) revealed an EST (EST # P25540. gb acc. # Z34233) with homology to the p Subunit of 
Riboflavin Synthase from E.coli. Using plasmid DNA of an Arabidopsis cDNA library (Minet 
etal., (1992) Plar)tJ. 2: 417-422) as a template, and synthetic oligonucleotides DG-63 
(SEQ ID N0:3) and DG-65 (SEC ID NO:4) designed to the EST sequence, a 204-bp DNA 
fragment was generated using the polymerase chain reaction (PCR). The 204-bp fragment 
was ligated into the TA cloning vector pCR II (Invitrogen Corp., San Diego, CA). Sequence 
determination by the chain termination method using dideoxy terminators labeled with 
fluorescent dyes (Applied Biosystems, Inc., Foster City. CA) confirmed that the sequence of 
the 204-bp fragment was identical to the sequence of EST #P25540. 

Approximatelyl 50,000 pfu of a lambda ZAP Arabidopsis cDNA library was plated at a 
density of 8,000 plaques per 10 cm Petri dish, and filter lifts of the plaques were made after 
7 hours growth at Z7°C. The plaque lifts were probed with the 204-bp fragment labeled with 
32P-dCTP by the random priming method by means of a PrimeTime kit (International 
Biotechnologies, Inc.. New Haven. CT). Hybridization conditions were 7% sodium dodecyl 
sulfate (SDS), 0.5 M NaP04 pH 7.0, 1 mM EDTA, 1% bovine albumin at 65°C. After 
hybridization overnight, the filters were washed with 1% SDS, 50mM NaPO*. 1mM EDTA at 
eS'C. Six positively hybridizing plaques were detected by autoradiography. After 
purification to single pbques, cDNA inserts were isolated, and their sequences were 
detennined by the chain termination method using dideoxy terminators labeled with 
fluorescent dyes (Applied Biosystems, Inc., Foster City, CA). A database search of the 
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longest Clone, designated RSP-1, using the GAP program (Deveraux. etaL. (1984) Nucleic 
Acids Res. 12:387-95) revealed sequence similarity to the riboflavin synthase p subunit 
from E. coll. The proteins are 68% similar and 44% identical. In addition, a comparison of 
the Arabidopsis mature protein to the E.coli riboflavin synthase p subunit suggests a 
chloroplast transit peptide is present. 

RSp-1, in the pBluescript SK vector, was deposited as pDG-4a.t. with the Agricultural 
Research Culture Collection (NRRL). 1815 N. University St.. Peoria, IL 61604, USA under 
the terms of the Budapest Treaty on February 7. 1995, and assigned NRRL accession 
number B-21400. 

The Arabidopsis cDNA sequence encoding RSp-1 is set forth in SEQ ID N0:1 and the 
encoded amino acid sequence is set forth in SEQ ID N0:2. 

Example 2: Isolation of Additional Lumazine Synthase Genes based on Sequence 
Similarity to the Arabidopsis Lumazine Synthase Coding Sequence 

A phage or plasmid library is plated at a density of approximately 8.000 pfu per 10 
cm Petri dish, and filter lifts of the plaques are made after 7 hours growth at 37''C. The 
plaque lifts are probed with the cDNA set forth in SEQ ID N0:1 . labeled with 32P-dCTP by 
the random priming method by means of a PrimeTime kit (International Biotechnologies. 
Inc.. New Haven, CT). Hybridization conditions are 7% sodium dodecyl sulfate (SOS), 0.5 
M NaP04 pH 7.0. 1 mM EDTA at 5Q°C. After hybridization overnight, the filters are washed 
with 2X SSC. 1% SDS at 50°C. Positively hybridizing plaques are detected by 
autoradiography. After purification to single plaques, cDNA inserts are isolated, and their 
sequences determined by the chain termination method using dideoxy terminators labeled 
with fluorescent dyes (Applied Biosystems, Inc., Foster City. CA). This experimental 
protocol can be used by one of ordinary skill in the art to obtain lumazine synthase genes 
substantially similar to the Arabidopsis coding sequence (SEQ ID N0:1) from any other 
plant species. 
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Example 3: Construction of a Vector Containing a GAL4 Binding Site/IVIinimal 35S CaMV 
Promoter Fused to Antisense Lumazine Syntliase 

pAT71: 

10 GAL4 binding sites and the minimal 35S promoter (-59 to +1) were excised from 
pGALLuc2 (Goff, ef a/., (1991) Genes & Developments: 298-309) as an EcoRI-PstI 
fragment and inserted into the respective sites of pL. jescript. yielding pAT52. pAT66 was 
constructed with a three-way ligation between the H/nd///-Psf/ fragment of pAT52, a Pstl- 
Ecof?/ fragment of pCIB1716 (contains a 35S untranslated leader, GUS gene, 35S 
terminator) and Hindlll-EcoRI cut pUCI 8. The 35S leader of pAT66 was excised with Pstl- 
Ncol and replaced with a PCR-generated 35S leader extending from +1 to +48 to yield 
pAT71. 

pJG304: 

Plasmid pBS SK+ (Stratagene, LaJolla, CA) was linearized with Sad. treated with 
mung bean nuclease to remove the Sac/ site, and re-ligated with T4 ligase to make 
pJG201. The 10XGAL4 consensus binding site/CaMV 35S minimal promoter/GUS 
gene/CaMV terminator cassette was removed from pAT71 with Kpnl and cloned into the 
Kpnl site of pJG201 to make pJG304. 

pJG304 was partially digested with restriction endonuclease Asp718 to isolate a full- 
length linear fragment. This fragment was ligated with a molar excess of the 22 base 
oligonucleotide JG-L (SEQ ID N0:5). Restriction analysis was used to identify a clone with 
this linker inserted 5' to the GAL4 DNA binding site, and this plasmid was designated 
pJG304?Xhol. 

pDG1: 

A fragment of the lumazine synthase cDNA clone was PCR-amplified from the cDNA 
clone RSp-1 using the oligonucleotides RS-1 (SEQ ID N0:6) and RS-2 (SEQ ID N0:7). 
This PGR product comprises the 5' portion of the lumazine synthase cDNA (SEQ ID NO:1), 
ending at base pair 792. 
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The vector pJG304?Xhol was digested with Sad and Ncol to excise tlie GUS gene 
coding sequence. The lumazine synthase PGR fragment was digested with Sad and Ncol 
and ligated into pJG304?Xhol to mal<e pDG1 . 
Example 4: Plant Transformation Vectors For Lumazine Synthase Antisense Expression 
From The GAL4 Binding Site/CaMV Minimal 35S Promoter 



pJG261 : 

Vector p'^PTV (Becker, et ai, (1 992) Plant Molecular Biology 20: 1 1 95-1 1 97) was 
digested with £co«/ and HindlllXo remove the nopaline synthase promoter/GUS cassette 
Concurrently: the superlinker was excised from pSE380 (Invitrogen, San Diego. CA) with 
EcoRI and Hindlll and cloned into the EcoR1/Hindlll linearized pGPTV. to make pJG261 . 



pDG2: 

pDGI was cut with XhoHo excise the cassette containing the GAL4 DNA binding 
site/35S minimal promoter/antisense lumazine synthase/CaMV terminator fusion. This 
cassette was ligated into X/io/-digested pJG261 . such that transcription was divergent from 
that of the bar selectable marker, producing pDG2. 

Example 5: Production Of GAL4 Binding Site/Minimal GaMV 35S 
Antisense Lumazine Synthase Transgenic Plants 

pDG2 was electro-transformed (Bio-Rad Laboratories. Hercules. CA) into 
Agrobacterium tumefaciens strain C58C1 (pMP90). and >\rab/c/ops/s plants (Ecotype 
Columbia) were transformed by infiltration (Bechtold. etal., (1993) C. R. Acad. Sci. Paris, 
316: 1188-93). Seeds from the infiltrated plants were selected on germination medium 
(Murashige-Skoog salts at 4.3 g/liter, Mes at 0.5 g/liter. 1% sucrose, thiamine at 10 ug/liter. 
pyridoxine at 5 ug/liter. nicotinic acid at 5 ug/liter. myo-inositol at 1 mg/liter. pH 5.8) 
containing Basta at 15 mg/liter. 

Example 6: Production of GAL4/C1 Transactivator Transgenic Plants 

pSGZLI was constructed by ligating the GAL4-C1 Ecofl/ fragment from pGALCI 
(Goff. etal.. (1991) Genes & Development, 5: 298-309) into the EcoRI sWe of plC20H. The 
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GAL4-C1 fragment of pSGZLI was excised with BamHhBglll and inserted into the BamHI 
site of pCIB770 (Rothstein, etaL, (1987) Gene 53: 153-161) yielding pAT53. 

Arabidopsis root explants were transformed with pAT53 as described in Valvekens, et 
ai, (1985) PNAS USA 85: 5536-5540, Transgenic plants with single site insertion and 
positive for GAL4/C1 expression were taken to homozygosity. 

Example 7: Antisense Inhibition of Lumazine Synthase Using a GAL4/C1 Transactivator 
and a GAL4 Binding Site/Minimal CaMV 35S Promoter 

Fifteen transgenic plants containing the GAL4 binding site/minimal CaMV 35S 
promoter/antisense lumazine synthase constmct were transplanted to soil and grown to 
maturity in the greenhouse. Flowers borne on the primary transformants were crossed to 
pollen from the homozygous GAL4/C1 transactivator line pAT53-103. F1 seeds were plated 
on germination medium and germination medium containing 15 mg/liter Basta. One line 
gave a 50% lethal phenotype on plates. Seedlings from the remaining F1 lines were 
transplanted to soil and grown to maturity in the greenhouse. Half of the seedlings from 2 
F1 lines died while in soil. 

Lumazine synthase antibody was generated in goat by injecting the synthetic peptides 
CIGAVIRGDTT (SEQ ID N0:8) and KAGNKGAETALTALEM (SEQ ID NO:9) conjugated to 
purified protein derivative Western analysis of F1 plants revealed a significant decrease in 
lumazine synthase levels (Towbin ef a/., PNAS USA 76: 4350-4354). 

Example 8: Expression and Purification of Recombinant Plant Lumazine Synthase in E. coli 

To produce recombinant plant lumazine synthase in E. coli, a translational fusion of 
the Arabidopsis lumazine synthase cDNA (SEQ ID N0:1) to the 5' end of the thioredoxin 
gene (LaVallie ef a/., (1992) e/ofechno/ofify 11:187-193) was created in pET-32a (Novagen, 
Inc.. Madison, Wl) using PGR. Synthetic oligonucleotide primers DG-252 (SEQ ID NO:10), 
DG-253 (SEQ ID N0:1 1), and DG-254 (SEQ ID N0:12) were used in a polymerase chain 
reaction to amplify DNA fragments of 693-bp and 483-bp in length. The PGR products were 
digested with Ncol and EcoRL The digestion products were separated on a low-gelling- 
temperature agarose gel and the fragments were excised. In parallel, plasmid pET32a was 
digested with Ncol and EcoRL The digestion products were separated on a gel, and the 
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pET32a vector was excised from the gel. The vector fragment was ligated to the two PCR- 
generated fragments, and the ligation products were transformed into competent E. co// XL1 
Blue cells (Stratagene, La Jolla, CA). 

Ampicillin-resistant colonies were selected, cultured, and their plasmid DNAs 
extracted. The structures of the plasmids were confirmed by sequencing with the chain 
termination method using dideoxy terminators labeled with fluorescent dyes (Applied 
Biosystems. Inc., Foster City. CA). The recombinant plasmids with expected structure were 
designated pET32aRSp FL-1 and pET32aRSp No CTP-1. 

Plasmids pET32aRSp FL-1 and pET32aRSp No CTP-1 were transformed into 
competent E.co// BL21(DE3) cells, and recombinant protein was expressed and purified 
according to the manufacturer's instruction - (pET System Manual. Novagen, Inc., Madison. 
Wl). The resulting fusion proteins produced by this strain contained approximately 1 32 
amino acids of E. co// thioredoxin protein. His-Tag. and thrombin cleavage site, followed by 
the presumptive mature coding sequence for Arabidopsis lumazine synthase, which begins 
at codon 1 of the predicted protein coding sequence for plasmid pET32aRSp FL-1 , and 
codon 71 of the predicted protein coding sequence for plasmid pET32aRSp No CTP-1 . 

Example 9: Lumazine Synthase Activity Assay 

Lumazine synthase activity is detected using an HPLC and fluorimeter combination. 
Both lumazine and 2.4-dioxy-5-amino-6-ribitylamino-pyrimidine (DARP) are fluorescent 
under the following conditions: excitation wavelength 407 nm; emission wavelength 487 nm. 
However, lumazine is about 6-fold more fluorescent than an equimolar concentra.on of 
DARP. There is also a 6-fold difference in absorbance between lumazine and DARP at 405 
nm. 3,4-dihydroxy-2-butanone phosphate does not fluoresce. Lumazine and DAP.P can be 
separated on a CI 8 column using 33% 90 mM formic acid, 60% water, and 7% methanol. 
Lumazine elutes first at four minutes, followed two minutes later by DARP. 

The peak area can be directly related to the molar quantity of lumazine produced. 
Optimization studies have shown the buffer for the reaction to be preferably 100 mM KPO4, 
pH 7, 5 mM p-mercaptoethanol, 2 mM DTT. The enzyme is active at a pH range of 6.5 - 
7.5. but pH 7 is most preferable. Kinetic studies show that the K„ for the butanone 
phosphate is 1 90 pM and the K. for DARP is 5.5 pM. Kis et a/., Biochem. 34: 2883-2892 
(1 995) reported K„ values of 130 and 5, respectively for the bacterial enzyme. The reaction 
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is incubated at 37^C for ten minutes and then stopped by the addition of 5% TCA. The 
precipitated proteins are removed by centrifugation and 10 pi of the supernatant is injected 
onto the HPLC. Because the reaction can proceed non-enzymatically. controls should be 
run with all samples to subtract this background activity. 

Example 10: High Throughput Screen 

A high throughput screen for novel inhibitors of lumazine synthase preferably 
exploits the fact that lumazine and DARP fluoresce at different intensities under optimal 
conditions for lumazine or the fact that there is a 6-fold difference in absorbance between 
these two compounds. An example of a protocol for a high throughput screen using 
fluorescence detection is as follows: lumazine synthase, buffer, test substance, and DARP 
are mixed together in the wells of a 96-well microtiter plate to a volume of 190 \il and the 
initial fluorescence value is determined (with, for example, a Waters fluorimetric microtiter 
plate reader). Reactions commence with the addition of a 10 jxl aliquot of 3,4-dihydroxy-2- 
butanone phosphate. After an appropriate incubation time, fluorescence is detemnined 
again. The differences between initial and final readings are then scaled as a percent of 
control reactions. Initial concentrations of substrates in the complete reaction mixture are 
preferably 50 for DARP and 0.5 mM for the butanone phosphate. Lumazine synthase 
amount and incubation time are adjusted to allow for the production of lumazine to a 
concentration of approximately 25 ^iM. This will produce a fluorescence signal that is 
approximately 3 to 4-fold greater than background. 

Example 11: Isolation of a cDNA Encoding the Bifunctional GTP Cyclohydrolase II / 3.4- 
Dihydroxy-2-Butanone-4-Phosphate Synthase from Arabidopsis 

A search of the Arabidopsis thaliana Expressed Sequence Tag (EST) database 
(Arabidopsis Biological Resource Center at Ohio State. Ohio State University, Columbus. 
OH) revealed an EST (EST # SCH1T7P; gb acc. # T12970) with homology to GTP 
cyclohydrolase from Bacillus subtilis. Using plasmid DNA of an Arabidopsis cDNA library 
(Minet et al. (1992) Plant J, 2. 417-422) as a template, and synthetic oligonucleotides DG- 
67 (SEQ ID N0:15) and DG-69 (SEQ ID N0:16) designed to the EST sequence, a 322-bp 
DNA fragment was generated using the polylmerase chain reaction (PCR). The 322-bp 
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fragment was ligated into the TA cloning vector pCR II (Invltrogen Corp.. San Diego, CA). 
Sequence determination by the chain termination method using dideoxy temiinators labeled 
with fluorescent dyes (Applied Biosystems, Inc.. Foster City. CA). confirmed that the 
sequence of the 322-bp fragment was identical to the sequence of EST # SCH1T7P. 

Approximately 150.000 pfu of a lambda ZAP Arabidopsis cDNA library was plated at a 
density of 8.000 plaques per 10 cm Petri dish, and filter lifts of the plaques were made after 
7 hours growth at 37°C. The plaque lifts were probed with the 322-bp fragment labeled with 
32P-dCTP by the random priming method by me- is of a PrimeTime kit (International 
Biotechnologies, Inc.. New Haven. CT). Hybridization conditions were 7% sodium dodecyl 
sulfate (SDS). 0.5 M NaP04 pH 7.0, 1 mM EDTA, 1% bovine albumin at 65°C. After 
hybridization overnight, the filters were washed with 1% SDS. 50mM NaP04. ImM EDTA at 
65'C. Ten positively hybridizing plaques were detected by autoradiography. After 
purification to single plaques. cDNA inserts were isolated, and their sequences were 
determined by the chain termination method using dideoxy terminators labeled with 
fluorescent dyes (Applied Biosystems. Inc.. Foster City. CA). A database search of the 
longest clone, designated GTP-1, using the GAP program (Deveraux et al.. Nucleic Acids 
Res. 12:387-95 (1984). revealed sequence similarity to the bifunctional GTP cyclohydrolase 
ll/3.4-dihydroxy-2-butanone-4-phosphate synthase of Bacillus subtilis. The proteins are 
70% similar and 54% identical, in addition, a comparison of the Arabidopsis mature protein 
to the Bacillus subtilis GTP cyclohydrolase ||/3,4-dihydroxy-2-butanone-4-phosphate 
synthase suggests a chloroplast transit peptide is present. 

GTP-1 . in the pBluescript SK vector, was deposited as pDG-3a.t. with the Agricultural 
Research Culture Collection (NRRL). 1815 N. University St.. Peoria, IL 61604. USA under 
the terms of the Budapest Treaty on February 7. 1995. and assigned NRRL accession 
number B-21399. 

The Arabidopsis cDHA sequence encoding GTP-1 is set forth in SEQ ID N0:13 and 
the amino acid sequence of the encoded mature protein, without the putative transit 
peptide, is set forth in SEQ ID NO:14. 

Example 1 2: Isolation of Additional GTP Cyclohydrolase II / 3.4-Dihydroxy-2- 
Butanone-4-Phosphate Synthase Genes Based On Sequence Homology the Arabidopsis 
GTP Cyclohydrolase II / 3.4-Dihydroxy-2-Butanone-4-Phosphate Synthase Coding 

Sequence 
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A phage or plasmid library is plated at a density of approximately 8,000 pfu per 10 cm 
Petri dish, and filter lifts of the plaques are made after 7 hours growth at 37»C. The plaque 
lifts are probed with the cDNA set forth in SEQ ID NO:13, labeled with 32P-dCTP by the 
random priming method by means of a PrimeTime kit (International Biotechnologies, Inc., 
New Haven, CT). Hybridization conditions are 7% sodium dodecyl sulfate (SDS). 0.5 M 
NaP04 pH 7.0, 1 mM EDTA at 50°C. After hybridization overnight, the filters are washed 
with 2X SSC, 1% oDS at 50°C. Positively hybridizing plaques are detected by 
autoradiography. After purification to single plaques. cDNA inserts are isolated, and their 
sequences are determined by the chain termination method using dideoxy terminators 
labeled with fluorescent dyes (Applied Biosystems, Inc.. Foster City, CA). This experimental 
protocol can be used by one of ordinary skill in the art to obtain bifunctional GTP 
cyclohydrolase II / 3,4-dihydroxy-2-butanone-4-phosphate synthase genes substantially 
similar to the Arabidopsis coding sequence (SEQ ID N0:13) from any other plant species. 

Example 13: Expression and Purification of Recombinant Plant GTP Cyclohydrolase II / 

DHBP Synthase in E. coli. 

To produce recombinant higher plant GTP cyclohydrolase II / 3,4-dihydroxy-2- 
butanone-4-phosphate synthase in E.coli. a translational fusion of the Arabidopsis GTP 
cyclohydrolase II / 3,4-dihydroxy-2-butanone-4-phosphate synthase cDNA (SEQ ID N0:13) 
to the 5' end of the thioredoxin gene (LaVallie et al.. Biotechnology 11:1 87-1 93 (1 992) was 
created in pET-32a (Novagen. Inc., Madison, Wl), using a two step PGR approach. 
Synthetic oligonucleotide primers DG-392a (SEQ ID N0:17) and DG-393a (SEQ ID N0:18) 
were used in a polymerase chanin reaction to amplify a DNA fragment of 939-bp in length. 
The PGR product was digested with Ncol and EcoRI. The digestion products were 
separated on a low-gelling-temperature agarose gel and the fragments were excised. In 
parallel, plasmid pET32a was digested with Ncol and EcoRI. The digestion products were 
separated on a gel. and the pET32a vector was excised from the gel. The vector fragment 
was ligated to the PGR generated fragment, and the ligation products were transformed into 
competent E. co// XL1 Blue cells (Stratagene, La Jolla, CA). 

Ampicillin-resistant colonies were selected, cultured, and their plasmid DNAs 
extracted. The structures of the plasmids were confirmed by sequencing with the chain 
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termination method using dideoxy terminators labeled with fluorescent dyes (Applied 
Biosystems, Inc.. Foster City. CA). The recombinant plasmid with expected stmcture was 

designated pET32aGTP-l. 

Synthetic oligonucleotide primers DG-390a (SEQ ID N0:19) and DG-391a (SEQ ID 
NO:20) were then used in a polymerase chain reaction to amplify a DNA fragment of 662- 
bp. The PGR product was digested with Ncol. The digestion products were separated on a 
low-gelling-temperature agarose gel and the fragments were excised. In parallel, plasmid 
pET32aGTP-1 was digested with Ncol. The digestion products were separated on a gel. 
and the pET32aGTP-1 vector was excised from the gel. The vector fragment was ligated to 
the PGR generated fragment, and the ligation products were transformed into competent E. 
CO// XL1 Blue cells (Stratagene. La Jolla. CA). 

Ampicillin-resistant colonies were selected, cultured, and their plasmid DNAs 
extracted. The structure of the plasmids were confirmed by sequencing with the chain 
termination method using dideoxy terminators labeled with fluorescent dyes (Applied 
Biosystems. inc.. Foster City. CA). The recombinant plasmid with expected structure was 
designated pET32aGTP-2. 

Plasmid pET32aGTP-2 was transformed into competent E. colt BL21(DE3) cells, and 
recombinant protein was expressed and purified according to the manufacturer's 
instructions (pET System Manual, Novagen, inc.. Madison, Wl). The resulting fusion 
proteins produced by this strain contained approximately 132 amino acids of E. coH 
thioredoxin protein. His-Tag. and thrombin cleavage site, followed by the presumptive 
mature coding sequence for Arabidopsis GTP cyclohydrolase II / 3,4-dihydroxy-2-butanone- 
4-phosphate synthase. 

Example 14: In vitro Recombination of Riboflavin Biosynthesis Genes by DNa snuffling 

A plant riboflavin biosynthesis gene (e.g., SEQ ID N0:1 or SEQ ID N0:13) encoding a 
riboflavin biosynthesis protein (e.g.. SEQ ID N0:2 or SEQ ID N0:14. respectively) is 
amplified by PGR. The resulting DNA fragment is digested by DNasel treatment essentially 
as described (Stemmer et al. (1994) PNAS 91 : 10747-10751) and the PGR primers are 
removed from the reaction mixture. A PGR reaction is carried out without primers and is 
followed by a PGR reaction with the primers, both as described (Stemmer et al. (1 994) 
PNAS 91: 10747-10751). The resulting DNA fragments are cloned into pTRG99a 
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(Pharmacia, Cat no: 27-5007-01) and transformed into a dioxygenase mutant host. e.g. by 
electroporation using the Blorad Gene Pulser and the manufacturer's conditions. The 
transformed host is grown on medium that contains inhibitory concentrations of an inhibitor 
selected according to a method described above, and those colonies that grow in the 
presence of the inhibitor are selected. Colonies that grow in the presence of normally 
inhibitory concentrations of inhibitor are picked and purified by repeated restreaking. Their 
plasmids are purified and the DNA sequences of cDNA inserts from plasmids that pass this 

test are then determined. 

In a similar reaction. PCR-amplified DNA fragments comprising a plant riboflavin 
biosynthesis gene of the invention encoding a riboflavin biosynthesis protein and PCR- 
amplified DNA fragments comprising a ribc.'icvir. biosynthesis gene from a different host are 
recombined in vitro and resulting variants with improved tolerance to the inhibitor are 
recovered as described above. 

Example 15: In vitro Recombination of Riboflavin Biosynthesis Genes by Staggered 

Extension Process 

A plant riboflavin biosynthesis gene (e.g.. SEQ ID NO:1 or SEQ ID NO:13) encoding a 
riboflavin biosynthesis protein (e.g., SEQ ID N0:2 or SEQ ID N0:14. respectively) and a 
corresponding riboflavin biooynthesis gene from a different host are each cloned into the 
polylinker of a pBluescript vector. A PCR reaction is carried out essentially as described 
(Zhao et al. (1998) Nature Biotechnology 16: 258-261) using the "reverse primer and the 
"Ml 3 20 primer- (Stratagene Catalog). Amplified PCR fragments are digested with 
appropriate restriction enzymes and cloned into pTRC99a and mutated riboflavin 
biosynthesis genes are screened as described in Example 14. 

Various modifications of the invention described herein will become apparent to those 
skilled in the art. Such modifications are intended to fall within the scope of the appended 
claims. 
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What Is Claimed Is: 

1 . A DNA molecule comprising a nucleotide sequence isolated from a plant that encodes 
an enzyme involved in riboflavin biosynthesis, wherein the enzyme has lumazine synthase 
activity or bifunctional GTP cyclohydrolase II / DHBP synthase activity. 

2. A DNA molecule according to claim 1, wherein the enzyme has lumazine synthase 
activity. 

3. A DNA molecule according to claim 2, wherein the enzyme comprises an amino acid 
sequence substantially similar to the amino acid sequence set forth in SEQ ID N0:2. 

4. A DNA molecule according to claim 2, wherein the enzyme comprises the amino acid 
sequence set forth in SEQ ID NO:2. 

5. A DNA molecule comprising a nucleotide sequence isolated from a plant that encodes 
an enzyme having lumazine synthase activity, wherein said DNA molecule hybridizes to a 
DNA molecule according to claim 4 under the following conditions: hybridization at 7% 
sodium dodecyl sulfate (SDS), 0.5 M NaP04 pH 7.0, 1 mM EDTA at 50X; wash with 2X 
SSC. 1% SDS, at SOX. 

6. A DNA molecule according to claim 2. wherein said DNA molecule hybridizes to the 
coding sequence set fcih in SEQ ID N0:1 under the following conditions: hybridization at 
7% sodium dodecyl sulfate (SDS), 0.5 M NaP04 pH 7.0, 1 mM EDTA at 50**C; wash with 2X 
SSC, 1% SDS. at 50°C 

7. A DNA molecule according to claim 2, wherein said DNA molecule comprises a 20 
base pair nucleotide portion identical in sequence to a consecutive 20 base pair portion of 
the coding sequence set forth in SEQ ID N0:1 . 

8. A DNA molecule according to claim 2, comprising the coding sequence set forth in SEQ 
ID NO:l. 
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9. A DNA molecule according to claim 1 . wherein the enzyme has bifunctional GTP 
cyclohydrolase II / DHBP synthase activity. 

10. A DNA molecule according to claim 9, wherein the enzyme comprises an amino acid 
sequence substantially similar to the amino acid sequence set forth in SEQ ID N0:14, 

1 1 . A DNA molecule according to claim 9, wherein the enzyme comprises the amino acid 
sequence set forth in SEQ ID NO: 14. 

12. A DNA molecule comprising a nucleotide sequence isolated from a plant that encodes 
an enzyme having bifunctional GTP cyclohydrolase II / DHBP synthase activity, wherein 
said DNA molecule hybridizes to a DNA molecule according to claim 1 1 under the following 
conditions: hybridization at 7% sodium dodecyl sulfate (SDS). 0.5 M NaP04 pH 7.0. 1 mM 
EDTA at 50X; wash with 2X SSC. 1 % SDS. at 50°C. 

13. A DNA molecule according to claim 9, wherein said DNA molecule hybridizes to the 
coding sequence set forth in SEQ ID NO; 13 under the following conditions: hybridization at 
7% sodium dodecyl sulfate (SDS), 0.5 M NaP04 pH 7.0, 1 mM EDTA at 50°C; wash with 2X 
SSC, 1%SDS, at 50*^0. 

14. A DNA molecule according to claim 9, wherein said DNA molecule comprises a 20 
base pair nucleotide portion identical in sequence to a consecutive 20 base pair portion of 
the coding sequence set forth in SEQ ID N0:13. 

15. A DNA molecule according to claim 9, comprising the coding sequence set forth in SEQ 
IDN0:13. 

16. A chimeric gene comprising a promoter operatively linked to a DNA molecule according 
to claim 1 . 

17. A recombinant vector comprising a chimeric gene according to claim 16, wherein said 
vector is capable of being stably transformed into a host cell. 
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18. A host cell comprising a vector according to claim 17, wherein said host cell is capable 
of expressing the DNA molecule encoding an enzyme involved in riboflavin biosynthesis. 

19. A host cell according to claim 18, wherein said host cell is selected from the group 
consisting of a bacterial cell, a yeast cell, and a plant cell. 

20. A host cell according to claim 19, which is a bacterial cell. 

21 . A process for producing nucleotides sequences encoding gene products having 
altered lumazine synthase activity comprising: 

(a) shuffling a DNA molecule according to claim 2; 

(b) expressing the resulting shuffled nucleotide sequences; and 

(c) selecting for altered lumazine synthase activity as compared to the activity of an 
enzyme encoded by a DNA molecule according to claim 2. 

22. The process of claim 21 . wherein the nucleotide sequence is SEQ ID NO: 1 

23. A shuffled DNA molecule obtainable by the process of claim 22. 

24. A shuffled DNA molecule according to claim 23. wherein said shuffled DNA molecule 
encodes an enzyme having enhanced tolerance to an inhibitor of lumazine synthase 
activity. 

25. A chimeric gene comprising a promoter operatively linked to a shuffled DNA molecule 
according to claim 23. 

26. A recombinant vector comprising a chimeric gene according to claim 25, wherein said 
vector is capable of being stably transformed into a host cell. 

27. A host cell comprising a vector according to claim 26. 

28. A host cell according to claim 27, wherein said host cell is selected from the group 
consisting of a bacterial cell, a yeast cell, and a plant cell. 
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29. A host cell according to claim 28, wherein said host cell is a plant cell. 

30. A plant or seed comprising a plant cell according to claim 29, 

31 . A plant according to claim 30. wherein said plant is tolerant to an inhibitor of lumazine 
synthase activity. 

32. A process for producing nucleotides sequences encoding gene products having 
altered bifunctional GTP cyclohydrolase II / DHBP synthase activity comprising: 

(a) shuffling a DNA molecule according to claim 9; 

(b) expressing the resulting shuffled nucleotide sequences; and 

(c) selecting for altered bifunctional GTP cyclohydrolase II / DHBP synthase activity 
as compared to the activity of an enzyme encoded by a DNA molecule according to claim 9. 

33. The process of claim 32, wherein the nucleotide sequence is SEQ ID NO: 13. 

34. A shuffled DNA molecule obtainable by the process of claim 33. 

35. A shuffled DNA molecule according to claim 34. wherein said shuffled DNA moiec jle 
encodes an enzyme having enhanced tolerance to an inhibitor of bifunctional GTP 
cyclohydrolase II / DHBP synthase activity. 

36. A chimeric gene comprising a promoter operatively linked to a shuffled DNA molecule 
according to claim 34. 

37. A recombinant vector comprising a chimeric gene according to claim 36. wherein said 
vector is capable of being stably transformed into a host cell. 

38. A host cell comprising a vector according to claim 37. 

39. A host cell according to claim 38. wherein said host cell is selected from the group 
consisting of a bacterial cell, a yeast cell, and a plant cell. 
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40. A host cell according to claim 39. wherein said host cell is a plant cell. 

41 . A plant or seed comprising a plant cell according to claim 40. 

42. A plant according to claim 41 , wherein said plant is tolerant to an inhibitor of 
bifunctional GTP cyclohydrolase II / DHBP synthase activity. 

43. An isolated plant enzyme involved in riboflavin biosynthesis, wherein said enzyme has 
lumazine synthase activity or bifunctional GTP cyclohydrolase II / DHBP synthase activity. 

44. An enzyme according to claim 43. wherein said enzyme has lumazine synthase activity. 

45. An enzyme according to claim 44. wherein said enzyme comprises an amino acid 
sequence substantially similar to the amino acid sequence set forth in SEQ ID N0:2. 

46. An enzyme according to claim 44. wherein said enzyme comprises the amino acid 
sequence set forth in SEQ ID N0:2. 

47. An enzyme according to claim 43. wherein said enzyme has bifunctional GTP 
cyclohydrolase II / DHBP synthase activity. 

48. An enzyme according to claim 47, wherein said enzyme comprises an amino acid 
sequence substantially similar to the amino acid sequence set forth in SEQ ID NO: 14, 

49. An enzyme according to claim 47, wherein said enzyme comprises the amino acid 
sequence set forth in SEQ ID N0:14. 

50. A method for screening a chemical for the ability to inhibit lumazine synthase activity, 

comprising the steps of: 

(a) combining an enzyme according to claim 44 in a first reaction mixture with 2,4- 
dioxy-5-amino-6-ribitylamino-pyrimidine and 3,4-dihydroxy-2-butanone 
phosphate under conditions in which the enzyme is capable of catalyzing the 
synthesis of lumazine; 
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(b) combining the chemical and the enzyme in a second reaction mixture with 2,4- 
dioxy-5-amino-6-ribitylamino-pyrimidine and 3.4-dihydroxy-2-butanone 
phosphate under the same conditions as in the first reaction mixture; 

(c) determining the amounts of lumazine produced in the first and second reaction 
mixtures; and 

(d) comparing the amounts of lumazine produced in the first and second reaction 
mixtures; 

wherein the chemical is capable of inhibiting the lumazine synthase activity of the enzyme if 
the amount of lumazine produced in the second reaction mixture is significantly less than 
the amount of lumazine produced in the first reaction mixture. 

51. A method according to claim 50, wherein the first reaction mixture comprises 50^M 2.4- 
dioxy-5-amino-6-ribitylamino-pyrimidine, and 0.5 mM 3.4-dihydroxy-2-butanone phosphate. 

52. A method according to claim 50, wherein the amounts of lumazine produced in the 
reaction mixtures are determined using a fluorimeter at an excitation wavelength of 407 nm. 

53. A chemical identified by the screening method of claim 50. 

54. A method for suppressing the growth of a plant, comprising applying to the plant the 
chemical of claim 53. whereby the chemical inhibits the activity of lumazine synthase in the 
plant. 



55. A method for screening a chemical for the abilitv to inhibit bif unctional GTP 
cyclohydrolase II / DHBP synthase activity, comprising the steps of: 

(a) combining an enzyme according to claim 47 in a first reaction mixture with GTP 
or ribulose-5-phosphate under conditions in which the enzyme is capable of 
catalyzing the synthesis of 2,5-diamino-4-oxy-6-ribosylamino-pyrimidine-5'- 
phosphate or 3.4-dihydroxy-2-butanone phosphate, respectively; 

(b) combining the chemical and the enzyme in a second reaction mixture with GTP 
or ribulose-5-phosphate under the same conditions as in the first reaction 
mixture; 
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(c) determining the amounts of 2,5-diamino-4-oxy-6-ribosylamino-pyrimicline-5'- 
phosphate or 3.4-dihydroxy-2-butanone phosphate produced in the first and 
second reaction mixtures; and 

(d) comparing the amounts of 2,5-diamino-4-oxy-6-ribosylamino-pyrimidine-5'- 
phosphate or 3,4-dihydroxy-2-butanone phosphate produced in the first and 
second reaction mixtures; 

wherein the chemical is capable of inhibiting the bifunctional GTP cyclohydrolase 11 / DHBP 
synthase activity of the enzyme if the amount of 2,5-diamino-4-oxy-6-ribosylamino- 
pyrimidine-5*-phosphate or 3,4-dihydroxy-2-butanone phosphate produced in the second 
reaction mixture is significantly less than the amount of 2,5-diamino-4-oxy-6-ribosylamino- 
pyrimidine-5'-phosphate or 3,4-dihydroxy-2-butanone phosphate produced in the first 
reaction mixture. 

56. A chemical identified by the screening method of claim 55. 

57. A method for suppressing the growth of a plant, comprising applying to the plant the 
chemical of claim 56. whereby the chemical inhibits the activity of GTP cyclohydrolase II / 
DHBP synthase in the plant. 

58. A plant, plant cell, plant seed, or plant tissue comprising a DNA molecule comprising a 
nucleotide sequence isolated from a plant that encodes an enzyme involved in riboflavin 
biosynthesis, wherein the enzyme has lumazine synthase activity or bifunctional GTP 
cyclohydrolase II / DHBP synthase activity, and wherein the DNA molecule confers upon 
said plant, plant cell, plant seed, or plant tissue tolerance to a herbicide in amounts that 
normally inhibit riboflavin biosynthesis. 

59. A plant, plant cell, plant seed, or plant tissue according to claim 58, wherein the 
enzyme has lumazine synthase activity, and wherein the DNA molecule confers upon the 
plant, plant cell, plant seed, or plant tissue tolerance to a herbicide in amounts that inhibit 
naturally occurring lumazine synthase activity. 

60. A plant, plant cell, plant seed, or plant tissue according to claim 59, wherein the 
enzyme comprises an amino acid sequence substantially similar to the amino acid 
sequence set forth in SEQ ID N0:2. 
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61. A plant, plant cell, plant seed, or plant tissue according to claim 59, wherein the DNA 
molecule hybridizes to the coding sequence set forth in SEQ ID N0:1 under the following 
conditions: hybridization at 7% sodium dodecyl sulfate (SDS), 0.5 M NaP04 pH 7.0, 1 mM 
EDTA at 50°C; wash with 2X SSC. 1% SDS, at 50°C. 

62. A plant, plant cell, plant seed, or plant tissue according to claim 58, wherein the 
enzyme has bifunctional GTP cyclohydrolase II / DHBi synthase activity, and wherein the 
DNA molecule confers upon the plant, plant cell, plant seed, or plant tissue tolerance to a 
herbicide in amounts that inhibit naturally occurring bifunctional GTP cyclohydrolase II / 
DHBP synthase activity. 

63. A plant, plant cell, plant seed, or plant tissue according to claim 62. wherein the 
enzyme comprises an amino acid sequence substantially similar to the amino acid 
sequence set forth in SEQ ID N0:14. 

64. A plant, plant cell, plant seed, or plant tissue according to claim 62. wherein the DNA 
molecule hybridizes to the coding sequence set forth in SEQ ID NO: 13 under the following 
conditions: hybridization at 7% sodium dodecyl sulfate (SDS), 0.5 M NaP04 pH 7.0. 1 mM 
EDTA at 50°C; wash with 2X SSC. 1% SDS. at 50°C. 

65. A method for selectively suppressing the growth of weeds in a field containing a crop of 
planted crop seeds or plants, comprising the steps of: 

(a) planting herbicide tolerant crops or crop seeds, which are plants or plant seeds 
according to claim 59; and 

(b) applying to the crops or crop seeds and the weeds in the field a herbicide in 
amounts that inhibit naturally occurring lumazine synthase activity, wherein the 
herbicide suppresses the growth of the weeds without significantly suppressing 
the growth of the crops. 

66. A method for selectively suppressing the growth of weeds in a field containing a crop of 
planted crop seeds or plants, comprising the steps of: 
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(a) planting herbicide tolerant crops or crop seeds, which are plants or plant seeds 
according to claim 39; and 

(b) applying to the crops or crop seeds and the weeds in the field a herbicide in 
amounts that inhibit naturally occurring bifunctional GTP cyclohydrolase II / 
DHBP synthase activity, wherein the herbicide suppresses the growth of the 
weeds without significantly suppressing the growth of the crops. 
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SEQUENCE LISTING 



(1) GENERAL INFORMATION: 

(i) APPLICANT: 

(A) NAME: Novartis AG 

(B) STREET: Schwarzwaldallee 215 

(C) CITY: Basel 

{ E) COUNTRY : Switzerland 

(F) POSTAL CODE (ZIP):4058 

(G) TELEPHONE: +41 61 324 11 11 

(H) TELEFAX: + 41 61 322 75 32 

(ii) TITLE OF INVENTION: RIBOFLAVIN BIOSYNTHESIS GENES FROM 
PLANTS AND USES THEREOF 

(iii) NUMBER OF SEQUENCES: 20 

(iv) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS/MS-DOS 

(D) SOFTWARE: Patentin Release #1.0, Version #1.30 

(2) INFORMATION FOR SEQ ID NO : 1 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LEIJCTH: 991 base pai^^ 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 35.. 718 
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(D) 



OTHER INFORMATION: /product= "lumazine synthase' 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 1 : 

AGAGAACCGT CTCTAAAACT CCGACGAACG AAAA ATG AAG TCA TTA GCT TCG 
52 

Met Lys Ser Leu Ala Ser 
1 5 

CCG CCG TGT CTC CGC CTG ATA CCG ACG GCA CAC CGT CAG CTC AAT TCG 
100 

Pro Pro Cys Leu Arg Leu He Pro Thr Ala His Arg Gin Leu Asn Ser 
10 15 20 

CGT CAA TCT TCC TCC GCC TGT TAT ATA CAC GGT GGC TCT TCT GTG AAC 
148 

Arg Gin Ser Ser Ser Ala Cys Tyr He His Gly Gly Ser Ser Val Asn 
25 30 

AAA TCC AAT AAT CTC TCA TTC TCC TCA TCC ACA TCC GGA TTT GCG TCA 
196 

Lys ser Asn Asn Leu Ser Phe Ser Ser Ser Thr Ser Gly Phe Ala Ser 
40 45 50 

CCA CTA GCT GTA GAG AAG GAA TTA CGC TCT TCA TTC GTA CAG ACG GCT 
244 

Pro Leu Ala Val Glu Lys Glu Leu Arg Ser Ser Phe Val Gin Thr Ala 
55 60 65 70 



GCT 



GTT CGC CAT GTT ACG GGG TCT CTT ATC AGA GGC GAA GGT CTT AGA 



292 

Ala Val Arg His Val Thr Gly Ser Leu He Arg Gly Glu Gly Leu Arg 
75 80 85 

TTC GCC ATC GTG GTA GCT CGT TTC AAT GAG GTT GTG ACT AAG TTG CTT 
340 

Phe Ala He Val Val Ala Arg Phe Asn Glu Val Val Thr Lys Leu Leu 
90 95 100 
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TTG GAA GGA GCG ATT GAG ACT TTC AAG AAG TAT TCA GTC AGA GAA GAA 
388 

Leu Glu Gly Ala He Glu Thr Phe Lys Lys Tyr Ser Val Arg Glu Glu 
105 110 

GAG ATT GAA GTT ATT TGG GTT CCT GGC AGC TTT GAA ATT GGT GTT GTT 
436 

Asp He Glu Val He Trp Val Pro Gly Ser Phe Glu He Gly Val Val 
120 125 130 

GCA CAA AAT CTT GGG AAA TCG GGA AAA TTT CAT GCT GTT TTA TGT ATC 
484 

Ala Gin Asn Leu Gly Lys Ser Gly Lys Phe His Ala Val Leu Cys He 

135 140 145 150 

GGC GCT GTG ATA AGA GGA GAT ACC ACA CAT TAT GAT GCT GTT GCC AAC 
532 

Gly Ala Val He Arg Gly Asp Thr Thr His Tyr Asp Ala Val Ala Asn 
155 160 165 

TCT GCT GCG TCT GGA GTA CTT TCT GCT AGC ATA AAT TCA GGC GTT CCA 
580 

Ser Ala Ala Ser Gly Val Leu Ser Ala Ser He Asn Ser Gly Val Pro 
170 175 180 

TGC ATA TTT GGT GTA CTG ACT TGC GAG GAC ATG GAT CAG GCT CTG AAT 
628 

Cys He Phe Gly Val Leu Thr Cys Glu Asp Met Asp Gin Ala Leu Asn 
185 190 195 

CGA TCT GGT GGC AAA GCC GGC AAT AAG GGA GCT GAA ACT GCT TTG ACG 
676 

Arg Ser Gly Gly Lys Ala Gly Asn Lys Gly Ala Glu Thr Ala Leu Thr 
200 205 210 



GCG CTC GAA ATG GCG TCG TTG TTT GAG CAC CAC CTG AAA TAG 



718 



Ala Leu Glu Met Ala Ser Leu Phe Glu His His Leu Lys • 
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215 220 225 

CTCGGCTCGT TCGATGGATG AACATGATCA CGTATGAGAA CCTCTTGATG TTGTCCCATT 
778 

TGGTTACAAT CCAGTCTCTG AAATTGTTTG TACCTCAAAG ATTGTCCAAA TGTTTTACCC 
838 

TTGGTTACCA AATCAATTAA ACGCTTTTGT AAGCTTCTGG CCTTGTTTTT TTTTTTTGAA 
898 

TCGTATGATA ATAATAATTC CTCCGAATTT TGGGGTCTTT CTGTACTAAT CAAAAATGTG 
958 

ATCTTCTTTG TTGTAAAAAA AAAAAAAAAA AAA 
991 

(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 228 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:2: 

Met Lys Ser Leu Ala Ser Pro Pro Cys Leu Arg Leu lie Pro Thr Ala 
15 10 15 

His Arg Gin Leu Asn Ser Arg Gin Ser Ser Ser Ala Cys Tyr He His 
20 25 30 

Gly Gly Ser Ser Val Asn Lys Ser Asn Asn Leu Ser Phe Ser Ser Ser 
35 40 45 

Thr Ser Gly Phe Ala Ser Pro Leu Ala Val Glu Lys Glu Leu Arg Ser 
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50 



55 60 



Ser Phe Val Gin Thr Ala Ala Val Arg His Val Thr Gly Ser Leu He 

70 75 80 

65 70 

Arg Gly Glu Gly Leu Arg Phe Ala He Val Val Ala Arg Phe Asn Glu 
85 90 95 

val val Thr Lys Leu Leu Leu Glu Gly Ala He Glu Thr Phe Lys Lys 
100 105 110 

Tyr ser Val Arg Glu Glu Asp He Glu Val He Trp Val Pro Gly Ser 
115 120 125 

Glu He Gly Val Val Ala Gin Asn Leu Gly Lys Ser Gly Lys Phe 
130 135 140 

is Ala val Leu Cys He Gly Ala Val He Arg Gly Asp Thr Thr His 
45 150 .155 160 

Tyr ASP Ala Val Ala Asn Ser Ala Ala Ser Gly Val Leu Ser Ala Ser 



Phe 

130 

His 
145 



165 



no 175 



He Asn Ser Gly Val Pro Cys He Phe Gly Val Leu Thr Cys Glu Asp 
180 185 190 

Gin Ala Leu Asn Arg Ser Gly Gly Lys Ala Gly Asn Lys Gly 



Met Asp 

195 



200 205 



Glu Thr Ala Leu Thr Ala Leu Glu Met Ala Ser Leu Phe Glu His 



Ala 

210 

His Leu Lys * 
225 



215 220 



(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 16 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(A) DESCRIPTION: /desc = "DG-63" 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:3: 

ATTTTGTAAC CAAGGG 
16 

(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 16 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(A) DESCRIPTION: /desc = "DG-65" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 

GGCAATAAGG GAGCTG 
16 

(2) INFORMATION FOR SEQ ID NO : 5 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 
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(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(A) DESCRIPTION: /desc = "JG-L" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 5 : 

GTACCTCGAG TCTAGACTCG AG 
22 

(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(A) DESCRIPTION: /desc = "RS-l" 



(xi) SEQUENCE DESCRIPTION: SEO ID NO : 6 : 

AGCTACCATG GGAGGTTCTC ATACGTG 
27 

(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
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(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(A) DESCRIPTION: /desc = "RS-2" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 

AGCTAGAGCT CACGAGAGAA CCGTCTC 
27 

(2) INFORMATION FOR SEQ ID NO : 8 : 

(i) SEQUENCE CHARACTERISTICS; 

(A) LENGTH: 11 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : not relevant 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 

Cys He Gly Ala Val He Arg Gly Asp Thr Thr 
1 5 10 

(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 16 amino acids 

(B) TYPE; amino acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: not relevant 
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(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 9 : 

Lys Ala Gly Asn Lys Gly Ala Glu Thr Ala Leu Thr Ala Leu Glu 

Met 

15 10 15 



(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(A) DESCRIPTION: /desc = "DG-252" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

GATCCCATGG CTAAGTCATT AGCTTCGCCG 
30 

(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE; nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



BNS page 67 



PCT/EP99/00556 

WO 99/38986 



(ii) MOLECULE TYPE: other nucleic acid 
(A) DESCRIPTION: /desc = "DG-253" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 

ATCGCCATGG CTGTTCGCCA TGTTACG 
27 

(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 31 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(A) DESCRIPTION: /desc = "DG-254'' 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:12: 

CAGTGAATTC CTAGAGCTAT TTCAGGTGGT G 
31 

(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1665 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 
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(ix) FEATXJRE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 2.. 143 2 

(D) OTHER INFORMATION: /product= "bif unctional GTP 
cyclohydrolase II / DHBP synthase" 



(xi) StgUENCE DESCRIPTION: SEQ ID NO: 13: 

C TCA TTC ACC AAC GGA AAC ACT CCT CTC TCA AAT GGG TCT CTC ATT 
46 

Ser Phe Thr Asn Gly Asn Thr Pro Leu Ser Asn Gly Ser Leu He 
1 5 10 15 

GAT GAT CGG ACC GAA GAG CCA TTA GAG GCT GAT TCG GTT TCA CTT GGA 
94 

Asp Asp Arg Thr Glu Glu Pro Leu Glu Ala Asp Ser Val Ser Leu Gly 
20 25 30 

ACA CTT GCT GCT GAT TCT GCT CCT GCA CCA GCC AAT GGT TTT GTT GCT 
142 

Thr Leu Ala Ala Asp Ser Ala Pro Ala Pro Ala Asn Gly Phe Val Ala 
35 40 45 

GAA GAT GAT GAC TTT GAG TTG GAT TTA CCA ACT CCT GGT TTC TCT TCT 
190 

Glu Asp Asp Asp Phe Glu Leu Asp Leu Pro Thr Pro Gly Phe Ser Ser 
50 55 60 

ATC CCT GAG GCC ATT GAA GAT ATA CGC CAA GGA AAG CTT GTG GTG GTT 
238 

He Pro Glu Ala He Glu Asp He Arg Gin Gly Lys Leu Val Val Val 
65 70 75 

GTG GAT GAT GAA GAT AGG GAA AAT GAA GGG GAT TTG GTG ATG GCT GCT 
286 

Val Asp Asp Glu Asp Arg Glu Asn Glu Gly Asp Leu Val Met Ala Ala 
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80 



85 90 95 



CAG TTA GCA ACA CCT GAA GCT ATG GCT TTT ATT GTG AGA CAT GGA ACT 
334 

Gin Leu Ala Thr Pro Glu Ala Met Ala Phe He Val Arg His Gly Thr 

100 105 110 

GGG ATA GTT TGT GTG AGO ATG AAA GAA GAT GAT CTC GAG AGG TTG CAC 
382 

Gly He Val Cys Val Ser Met Lys Glu Asp Asp Leu Glu Arg Leu Kis 
115 120 125 

CTT CCT CTA ATG GTG AAT CAG AAG GAA AAC GAA GAA AAG CTC TCT ACT 
430 

Leu Pro Leu Met Val Asn Gin Lys Glu Asn Glu Glu Lys Leu Ser Thr 
130 135 140 

GCA TTT ACA GTG ACT GTG GAT GCA AAA CAT GGC ACA ACA ACG GGA GTA 
478 

Ala Phe Thr Val Thr Val Asp Ala Lys His Gly Thr Thr Thr Gly Val 
145 150 155 

TCA GCT CGT GAC AGG GCA ACA ACC ATA TTG TCT CTT GCA TCA AGA GAT 
526 

Ser Ala Arg Asp Arg Ala Thr Thr He Leu Ser Leu Ala Ser Arg Asp 
160 165 1-70 175 

TCA AAG CCT GAG GAT TTC AAT CGT CCA GGT CAT ATC TTC CCA CIG AAG 
574 

Ser Lys Pro Glu Asp Phe Asn Arg Pro Gly His He Phe Pro Leu Lys 
180 185 190 

TAT CGG GAA GGT GGG GTT CTG AAA AGG GCT GGA CAC ACT GAA GCA TCT 
522 

Tyr Arg Glu Gly Gly Val Leu Lys Arg Ala Gly His Thr Glu Ala Ser 
195 200 205 

GTT GAT CTC ACT GTT TTA GCT GGA CTG GAT CCT GTT GGA GTA C'iT TGT 
670 
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Val Asp Leu Thr Val Leu Ala Gly Leu Asp Pro Val Gly Val Leu Cys 
210 215 220 

GAA ATT GTT GAT GAT GAT GGT TCC ATG GCT AGA TTA CCA AAA CTT CGT 
718 

Glu lie Val Asp Asp Asp Gly Ser Met Ala Arg Leu Pro Lys Leu Arg 
225 230 235 

GAA TTT GCC GCC GAG AAC AAC CTG AAA GTT GTT TCC ATC GCA GAT TTG 
766 

Glu Phe Ala Ala Glu Asn Asn Leu Lys Val Val Ser He Ala Asp Leu 
240 245 250 255 

ATC AGG TAT AGA AGA AAG AGA GAT AAA TTA GTG GAA CGT GCT TCT GCG 
814 

He Arg Tyr Arg Arg Lys Arg Asp Lys Leu Val Glu Arg Ala Ser Ala 

260 265 270 

GCT CGG ATC CCA ACA ATG TGG GGA CCT TTC ACT GCT TAC TGC TAT AGG 
862 

Ala Arg He Pro Thr Met Trp Gly Pro Phe Thr Ala Tyr Cys Tyr Arg 
275 280 285 

TCC ATA TTA GAC GGA ATA GAG CAC ATA GCA ATG GTT AAG GGT GAG ATT 
910 

Ser lie Leu Asp Gly He Glu His He Ala Met Val Lys Gly Glu He 
290 295 300 

GGT GAC GGT CAA GAC ATT CTC GTG AGG GTT CAT TCT GAA TGT CTA ACA 
958 

Gly Asp Gly Gin Asp He Leu Val Arg Val His Ser Glu Cys Leu Thr 

305 310 315 

GGG GAC ATA TTT GGG TCT GCA AGG TGT GAT TGC GGG AAC CAG CTA GCA 
1006 

Gly Asp He Phe Gly Ser Ala Arg Cys Asp Cys Gly Asn Gin Leu Ala 

320 325 330 335 

CTC TCG ATG CAG CAG ATC GAG GCT ACT GGT CGC GGT GTG CTG GTT TAC 
1054 
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Leu Ser Met Gin 



CTA CGT GGA CAT 
1102 

Leu Arg Gly His 
355 

GCT TAG AAT CTG 
1150 

Ala Tyr Asn Leu 
370 

GAA TTA GGA CTT 
1198 

Glu Leu Gly Leu 
385 

ATA ATA AGG GAT 
1246 

lie lie Arg Asp 
400 

CCC CCA AAG TAT 
1294 

Pro Pro Lys Tyr 



AGA GTC CCT CTA 
1342 

Arg Val Pro Leu 
435 

GAG ACA AAG CGG 
1390 

Glu Thr Lys Arg 
450 

GGG GAT GTT GTG 
1432 
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Gin lie Glu Ala 
340 

GAA GGA AGA GGG 
Glu Gly Arg Gly 

CAA GAT GCT GGT 

Gin Asp Ala Gly 
375 

CCT GTT GAT TCT 

Pro Val Asp Ser 
390 

TTA GGT GTT AGG 

Leu Gly Val Arg 
405 

GTT GGT TTG AAG 

Val Gly Leu Lys 
420 

TTG AGT CTT ATC 
Leu Ser Leu lie 

ACC AAG ATG GGT 

Thr Lys Met Gly 
455 

GAG AAG ATT GAG 



Thr Gly Arg Gly 
345 

ATC GGT TTA GGA 

lie Gly Leu Gly 
360 

CGA GAC ACG GTT 
Arg Asp Thr Val 

AGA GAG TAT GGA 

Arg Glu Tyr Gly 
395 

ACA ATG AAG CTG 

Thr Met Lys Leu 
410 

GGA TAT GGA TTA 

Gly Tyr Gly Leu 
425 

ACG AAG GAG AAT 

Thr Lys Glu Asn 
440 

CAC ATG TAT GGC 
His Met Tyr Gly 

TCT GAA TCT GAG 



Val Leu Val Tyr 
350 

CAC AAG CTT CGA 

His Lys Leu Arg 
365 

GAA GCT AAT GAG 

Glu Ala Asn Glu 
380 

ATT GGT GCA CAG 
He Gly Ala Gin 

ATG ACA AAT AAT 

Met Thr Asn Asn 
415 

GCC ATT GTT JGG 

Ala He Val Gly 
430 

AAG AGA TAT CTG 

Lys Arg Tyr Leu 
445 

TTG AAG TTC AAA 

Leu Lys Phe Lys 
460 

TCC TAA 



W099/38986 ^5 PCT/EP99/00556 



Gly Asp Val Val Glu Lys He Glu Ser Glu Ser Glu Ser * 
465 470 475 

GCTTAAAAAC CAGGACGAAC CGAATGGAAT CAAGAACTAT AGATATAATA CTTCCCAAAA 
1492 

AACAAGGAAA GAAATTGACA CAGAAGAAGA GGAAAAAGAC ATTTGATCTG TCTGAGAAAC 
1552 

TTGATTAGAT TGGTTTATGT TCTAATCTAA TCTGATTTGA TTTTTTTTTA TTTTGTCTAC 
1612 

GATTCTTGAG TTACGAAATG TTCATCATTT GTTAAAAAAA AAAAAAAAAA AAA 
1665 



(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 477 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 

Ser Phe Thr Asn Gly Asn Thr Pro Leu Ser Asn Gly Ser Leu He Asp 
15 10 15 

Asp Arg Thr Glu Glu Pro Leu Glu Ala Asp Ser Val Ser Leu Gly Thr 
20 25 30 

Leu Ala Ala Asp Ser Ala Pro Ala Pro Ala Asn Gly Phe Val Ala Glu 
35 40 45 

Asp Asp Asp Phe Glu Leu Asp Leu Pro Thr Pro Gly Phe Ser Ser He 
50 55 60 
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Pro Glu Ala He Glu Asp He Arg Gin Gly Lys Leu Val Val Val Val 
65 70 75 80 

Asp Asp Glu Asp Arg Glu Asn Glu Gly Asp Leu Val Met Ala Ala Gin 
85 90 95 

Leu Ala Thr Pro Glu Ala Met Ala Phe He Val Arg His Gly Thr Gly 
100 105 110 

He Val Cys Val Ser Met Lys Glu Asp . 3p Leu Glu Arg Leu His Leu 
115 120 125 

Pro Leu Met Val Asn Gin Lys Glu Asn Glu Glu Lys Leu Ser Thr Ala 
130 135 140 

Phe Thr Val Thr Val Asp Ala Lys His Gly Thr Thr Thr Gly Val Ser 
145 150 155 160 

Ala Arg Asp Arg Ala Thr Thr He Leu Ser Leu Ala Ser Arg Asp Ser 
165 170 175 

Lys Pro Glu Asp Phe Asn Arg Pro Gly His He Phe Pro Leu Lys Tyr 
180 185 190 

Arg Glu Gly Gly Val Leu Lys Arg Ala Gly His Thr Glu Ala Ser Val 
195 200 205 

Asp Leu Thr Val Leu Ala Gly Leu Asp Pro Val Gly Val Leu Cys Glu 
210 215 220 

He Val Asp Asp Asp Gly Ser Met Ala Arg Leu Pro Lys Leu Arg Glu 
225 230 235 240 

Phe Ala Ala Glu Asn Asn Leu Lys Val Val Ser He Ala Asp Leu He 
245 250 255 

Arg Tyr Arg Arg Lys Arg Asp Lys Leu Val Glu Arg Ala Ser Ala Ala 
260 265 270 
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Arg He Pro Thr Met Trp Gly Pro Phe Thr Ala Tyr Cys Tyr Arg Ser 
275 280 285 

He Leu Asp Gly He Glu His He Ala Met Val Lys Gly Glu He Gly 
290 295 300 

Asp Gly Gin Asp He Leu Val Arg Val His Ser Glu Cys Leu Thr Gly 
305 310 315 320 

Asp He Phe Gly Ser Ala Arg Cys Asp Cys Gly Asn Gin Leu Ala Leu 
325 330 335 

Ser Met Gin Gin He Glu Ala Thr Gly Arg Gly Val Leu Val Tyr Leu 
340 345 350 

Arg Gly His Glu Gly Arg Gly He Gly Leu Gly His Lys Leu Arg Ala 
355 360 365 

Tyr Asn Leu Gin Asp Ala Gly Arg Asp Thr Val Glu Ala Asn Glu Glu 
370 375 380 

Leu Gly Leu Pro Val Asp Ser Arg Glu Tyr Gly He Gly Ala Gin He 
385 390 395 400 

He Arg Asp Leu Gly Val Arg Thr Met Lys Leu Met Thr Asn Asn Pro 
405 410 415 

Pro Lys Tyr Val Gly Leu Lys Gly Tyr Gly Leu Ala He Val Gly Arg 
420 425 430 

Val Pro Leu Leu Ser Leu He Thr Lys Glu Asn Lys Arg Tyr Leu Glu 
435 440 445 

Thr Lys Arg Thr Lys Met Gly His Met Tyr Gly Leu Lys Phe Lys Gly 
450 455 460 



Asp Val Val Glu Lys He Glu Ser Glu Ser Glu Ser * 
465 470 475 
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(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 16 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(A) DESCRIPTION: /desc = "DG-67" 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:15: 

GCTAATGAGG AATTAG 
16 

(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 16 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(A) DESCRIPTION: /desc = "DG-eS" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 

TGATTCCATT CGGTTC 
16 

(2) INFORMATION FOR SEQ ID NO: 17: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = "DG-392a" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 

TGTCTCTTGC ATCAAGAG 
18 

(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 33 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = "DG-393a" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 

CAGTGAATTC TTAAGCTTAG GACTCAGATT CAG 
33 

(2) INFORMATION FOR SEQ ID NO: 19: 



3SS0OCID <WO 9938986A2_I„> 



BNSpage 77 



wo 99/38986 PCT/EP99/00556 

20 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = "DG-390a" 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:19: 

GATCCCATGG GTTTCTCTTC TATCG 
25 

(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = "DG-391a" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:20: 

CCGAGCCGCA GAAGCACG 
18 
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The DNA substrate molecules to be digested can either be from in vivo 
replicated DNA, such as a plasmid preparation, or from PGR amplified nucleic acid 
fragments harboring the restriction enzyme recognition sites of interest, preferably near the 
ends of the fragment. Typically, at least two variants of a gene of interest, each having one 
or more mutations, are digested with at least one restriction enzyme determined to cut within 
the nucleic acid sequence of interest. The restriction fragments are then joined with DNA 
ligase to generate full length genes having shuffled regions. The number of regions shuffled 
will depend on the number of cuts within the nucleic acid sequence of interest. The shuffled 
molecules can be introduced into cells as described above and screened or selected for a 
desired property as described herein. Nucleic acid can then be isolated from pools 
(libraries), or clones having desired properties and subjected to the same procedure until a 
desired degree of improvement is obtained. 

In some embodiments, at least one DNA substrate molecule or fragment 
thereof is isolated and subjected to mutagenesis. In some embodiments, the pool or library of 
reiigated restriction fragments are subjected to mutagenesis before the digestion-ligation 
process is repeated. "Mutagenesis" as used herein includes such techniques known in the art 
as PGR mutagenesis, oligonucleotide-directed mutagenesis, site-directed mutagenesis, etc., 
and recursive sequence recombination by any of the techniques described herein. 

2 Reassembly PCR 

A further technique for recombining mutations in a nucleic acid sequence 
utilizes "reassembly PGR " This method can be used to assemble multiple segments that 
have been separately evolved into a full length nucleic acid template such as a gene. This 
technique is performed when a pool of advantageous mutants is known from previous work 
or has been identified by screening mutants that may have been created by any mutagenesis 
technique known in the such as PGR mutagenesis, cassette mutagenesis, doped oligo 
mutagenesis, chemical mutagenesis, or propagation of the DNA template in vivo in mutator 
strains. Boundaries defining segments of a nucleic acid sequence of interest preferably lie in 
intergenic regions, introns, or areas of a gene not likely to have mutations of interest. 
Preferably, oligonucleotide primers (oligos) are synthesized for PGR amplification of 
segments of the nucleic acid sequence of interest, such that the sequences of the 
oligonucleotides overiap the junctions of two segments. The overiap region is typically 
about 10 to 100 nucleotides in length. Each of the segments is amplified with a set of such 
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primers. The PGR products are then "reassembled" according to assembly prbtocols such as 
those discussed herein to assemble randomly fragmented genes. In brief, in an assembly 
protocol the PGR products are first purified away fi-om the primers, by, for example, gel 
electrophoresis or size exclusion chromatography. Purified products are mixed together and 

4 5 subjected to about 1-10 cycles of denaturing, reannealing, and extension in the presence of 

polymerase and deoxynucleoside triphosphates (dNTP's) and appropriate buffer salts in the 

I absence of additional primers ("self-priming"). Subsequent PGR with primers flanking the 

gene are used to amplify the yield of the fiilly reassembled and shuffled genes. 

In some embodiments, the resulting reassembled genes are subjected to 
10 mutagenesis before the process is repeated. 

In a fiirther embodiment, die PGR primers for amplification of segments of 
the nucleic acid sequence of interest are used to introduce variation into the gene of interest 
as follows. Mutations at sites of interest in a nucleic acid sequence are identified by 
screening or selection, by sequencing homologues of the nucleic acid sequence, and so on. 

^ 1 5 Oligonucleotide PGR primers are then synthesized which encode wild type or mutant 

4 information at sites of interest. These primers are then used in PGR mutagenesis to generate 

libraries of full length genes encoding permutations of wild type and mutant information at 
the designated positions. This technique is typically advantageous in cases where the 
screening or selection process is expensive, cumbersome, or impractical relative to the cost 
20 of sequencing the genes of mutants of interest and synthesizing mutagenic oligonucleotides. 



3, Site Directed Mutagenesis (SDM) with Oligonucleotides Encoding Homologue 
Mutations Followed by Shuffling 

In some embodiments of the invention, sequence information from one or 
25 more substrate sequences is added to a given "parental" sequence of interest, with 

subsequent recombination between rounds ot screening or selection. Typically, this is done 
with site-directed mutagenesis performed by techniques well known in the art (e.g., Berger, 
Ausubel and Sambrook, supra.) with one substrate as template and oligonucleotides 
encoding single or multiple mutations from other substrate sequences, e.g. homologous 
30 genes. After screening or selection for an improved phenotype of interest, the selected 
recombinant(s) can be further evolved using RSR techniques described herein. After 
screening or selection, site-directed mutagenesis can be done again with another collection 
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of oligonucleotides encoding iiomologue mutations, and the above process repeated until the 

desired properties are obtained. 

When the difference between two homologues is one or more single point 

mutations in a codon, degenerate oligonucleotides can be used that encode the sequences in 
5 both homologues. One oligonucleotide can include many such degenerate codons and still 

allow one to exhaustively search all permutations over that block of sequence. 
I When the homologue sequence space is very large, it can be advantageous to 

^' restrict the search to certain variants. Thus, for example, computer modeling tools (Lathrop 

et al, J. Mol. Biol. 255:641-665 (1996)) can be used to model each homologue mutation 
' 1 0 onto the target protein and discard any mutations that are predicted to grossly disrupt 

^ ' structure and function. 

4. In vitro DMA Shuffling Formats 

In one embodiment for shuffling DNA sequences in vitro, the initial 
I 1 5 substrates for recombination are a pool of related sequences, e.g. , different variant forms, as 

i horaologs from different individuals, strains, or species of an organism, or related sequences 

from the same organism, as allelic variations. The sequences can be DNA or RNA and can 
be of various lengths depending on the size of the gene or DNA fragment to be recombined 
or reassembled. Preferably the sequences are fi-om 50 base pairs (bp) to 50 kilobases (kb). 
20 The pool of related substrates are converted into overlapping fragments, e.g. , 

from about 5 bp to 5 kb or more. Often, for example, the size of the fragments is from about 
10 bp to 1000 bp, and sometimes the size of the DNA fragments is from about 100 bp to 500 
bp. The conversion can be effected by a number of different methods, such as DNase I or 
RNase digestion, random shearing or partial restriction enzyme digestion. For d-cussions of 
25 protocols for the isolation, manipulation, enzymatic digestion, and the like of nucleic acids, 
'"^ see, for example. Sambrook et al. and Ausubel, both supra. The concentration of nucleic 

acid fragments of a particular length and sequence is often less than 0.1 % or 1% by weight 
I of the total nucleic acid. The number of different specific nucleic acid fragments in the 

mixture is usually at least about 100, 500 or 1000. 
30 The mixed population of nucleic acid fragments are converted to at least 

partially single-stranded form using a variety of techniques, including, for example, heating, 
chemical denaturation, use of DNA binding proteins, and the like. Conversion can be 
effected by heating to about 80 °C to 100 °C, more preferably from 90 ''C to 96 °C, to form 
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single-Stranded nucleic acid fragments and then reannealing. Conversion can also be 
effected by treatment with single-stranded DNA binding protein (see Wold, Annu. Rev. 
Biochem. 66:61-92 (1997)) orrecA protein {see, e.g., Kiianitsa, Proc, Natl. Acad ScL U SA 
94:7837-7840 (1997)). Single-stranded nucleic acid fragments having regions of sequence 
^ 5 identity with other single-stranded nucleic acid fragments can then be reannealed by cooling 

4 to 20 °C to 75 ^C, and preferably from 40 ^^C to 65 °C. Renaturation can be accelerated by 

I the addition of polyethylene glycol (PEG), other volume-excluding reagents or salt. The salt 

concentration is preferably from 0 mM to 200 mM, more preferably the salt concentration is 
:H from 10 mM to 100 mM. The salt may be KCl or NaCl, The concentration of PEG is 

10 preferably from 0% to 20%, more preferably from 5% to 10%. The fragments that reanneal 
can be from different substrates. The annealed nucleic acid fragments are incubated in the 
presence of a nucleic acid polymerase, such as Taq or Klenow, and dNTP*s {Le. dATP, 
dCTP, dGTP and dTTP). If regions of sequence identity are large, Taq polymerase can be 
used with an annealing temperature of between 45-65 °C. If the areas of identity are small, 
I . 15 Klenow polymerase can be used with an annealing temperature of between 20-30 °C. The 
polymerase can be added to the random nucleic acid fragments prior to annealing, 
simultaneously with aimealing or after annealing. 

The process of denaturation, renaturation and incubation in the presence of 
polymerase of overlapping fragments to generate a collection of polynucleotides containing 
20 different permutations of fragments is sometimes referred to as shuffling of the nucleic acid 
in vitro. This cycle is repeated for a desired number of times. Preferably the cycle is 
repeated from 2 to 100 times, more preferably the sequence is repeated from 10 to 40 times. 
The resulting nucleic acids are a family of double-stranded polynucleotides of from about 50 
bp to about 100 kb, preferably from 500 bp to 50 kb. The population represents variants of 
25 the starting substrates showing substantial sequence identity thereto but also diverging at 

several positions. The population has many more members than the starting substrates. The 
i population of fragments resulting from shuffling is used to transform host cells, optionally 

after cloning into a vector. 

In one embodiment utilizing in vitro shuffling, subsequences of 
30 recombination substrates can be generated by amplifying the full-length sequences under 
conditions which produce a substantial fraction, typically at least 20 percent or more, of 
.2 incompletely extended amplification products. Another embodiment uses random primers to 

prime the entire template DNA to generate less than ftill length amplification products. The 
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amplification products, including the incompletely extended amplification products are 
denatured and subjected to at least one additional cycle of reannealing and amplification. 
This variation, in which at least one cycle of reannealing and amplification provides a 
substantial fraction of incompletely extende-^ p'-oducts, is termed "stuttering." In the 
subsequent amplification round, the partially extended (less than full length) products 
reanneal to and prime extension on different sequence-related template species. In another 
embodiment, the conversion of substrates to fragments can be effected by partial PGR 

amplification of substrates. 

In another embodiment, a mixture of fragments is spiked with one or more 
oligonucleotides. The oligonucleotides can be designed to include precharacterized 
mutations of a wildtype sequence, or sites of natural variations between individuals or 
species. The oligonucleotides also include sufficient sequence or structural homology 
flanking such mutations or variations to allow annealing with the wildtype fragments. 
Annealing temperatures can be adjusted depending on the length of homology. 

In a fiirther embodiment, recombination occurs in at least one cycle by 
template switching, such as when a DNA fragment derived from one template primes on the 
homologous position of a related but different template. Template switching can be induced 
by addition of recA {see, Kiianitsa supra (1997)), radSl {see, Namsaraev, Mol. Cell. Biol. 
17:5359-5368 (1997)), rad55 {see, Clever, EMBOJ. 16:2535-2544 (1997)), rad57 {see, 
Sung, Genes Dev. 11:1111-1121 (1997)) or other polymerases (e.g., viral polymerases, 
reverse transcriptase) to the amplification mixture. Template switching can also be increased 
by increasing the DNA template concentration. 

Another embodiment utilizes at least one cycle of amplification, which can be 
conducted using a collection of overlapping single-stranded DNA fragments of related 
sequence, and different lengths. Fragments can be prepared using a single stranded DNA 
phage, such as M13 {see, Wang, Biochemistry 36:9486-9492 (1997)). Each fragment can 
hybridize to and prime polynucleotide chain extension of a second fragment from the 
collection, thus forming sequence-recombined polynucleotides. In a fiirther variation, 
ssDNA fragments of variable length can be generated from a single primer by Pfii, Taq, 
Vent, Deep Vent, UlTma DNA polymerase or other DNA polymerases on a first DNA 
template {see, Cline. Nucleic Acids Res. 24:3546-3551 (1996)). The single stranded DNA 
fragments are used as primers for a second, Kunkel-type template, consisting of a uracil- 
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containing circular ssDNA. This results in multiple substitutions of the first template into 
the second. See, Levichkin, Moi Biology 29:572-577 (1995); Jung, Gene 121:17-24 (1992). 

In some embodiments of the invention, shuffled nucleic acids obtained by use 
of the recursive recombination methods of the invention, are put into a cell and/or organism 
5 for screening. Shuffled monooxygenase genes can be introduced into, for example, bacterial 
cells, yeast cells, fungal cells vertebrate cells, invertebrate cells or plant cells for initial 
screening. Bacillus species (such as 5. subtilis and £ coli are two examples of suitable 
bacterial cells into which one can insert and express shuffled monooxygenase genes which 
provide for convenient shuttling to other cell types (a variety of vectors for shuttling material 

10 between these bacterial cells and eukaryotic cells are available; see, Sambrook, Ausubel and 
Berger, all supra). The shuffled genes can be introduced into bacterial, fiingal or yeast cells 
either by integration into the chromosomal DNA or as plasmids. 

Although bacterial and yeast systems are most preferred in the present 
invention, in one embodiment, shuffled genes can also be introduced into plant cells for 

1 5 production purposes (it will be appreciated that transgenic plants are, increasingly, an 

important source of industrial enzymes). Thus, a transgene of interest can be modified using 
the recursive sequence recombination methods of the invention in vitro and reinserted into 
the cell for in vivo/in situ selection for the new or improved monooxygenase property, in 
bacteria, eukaryotic cells, or whole eukaryotic organisms. 

20 

5. In vivo DNA Shuffling Formats 

In some embodiments of the invention, DNA substrate molecules are 
introduced into cells, wherein the cellular machinery directs their recombination. For 
example, a library of mutants is constructed and screened or selected for mutants with 

25 improved phenotypes by any of the techniques described herein. The DNA substrate 

molecules encoding the best candidates are recovered by any of the techniques described 
herein, then fragmented and used to transfect a plant host and screened or selected for 
improved function. If further improvement is desired, the DNA substrate molecules are 
recovered from the host cell, such as by PGR, and the process is repeated until a desired 

30 level of improvement is obtained. In some embodiments, the fragments are denatured and 
reannealed prior to transfection, coated with recombination stimulating proteins such as 
recA, or co-transfected with a selectable marker such as Neo*^ to allow the positive selection 
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for cells receiving recombined versions of the gene of interest. Methods for in vivo shuffling 
are described in, for example, PCT application WO 98/13487 and WO 97/20078, 

The efficiency of m vivo shuffling can be enhanced by increasing the copy 
number of a gene of interest in the host cells. For example, the majority of bacterial cells in 
5 stationary phase cultures grown in rich media contain two, four or eight genomes. In 
minimal medium the cells contain one or two genomes. The number of genomes per 

i bacterial cell thus depends on the growth rate of the cell as it enters stationary phase. This is 

I 

% because rapidly growing cells contain multiple replication forks, resulting in several 

genomes in the cells after termination. The number of genomes is strain dependent, 
^ 1 0 although all strains tested have more than one chromosome in stationary phase. The number 

of genomes in stationary phase cells decreases with time. This appears to be due to 
fragmentation and degradation of entire chromosomes, similar to apoptosis in mammalian 
cells. This fragmentation of genomes in cells containing multiple genome copies results in 
massive recombination and mutagenesis. The presence of multiple genome copies in such 
15 cells results in a higher frequency of homologous recombination in these cells, both between 
^ copies of a gene in different genomes within the cell, and between a genome within the cell 

and a transfected fragment. The increased frequency of recombination allows one to evolve 
a gene evolved more quickly to acquire optimized characteristics. 

In nature, the existence of multiple genomic copies in a cell type would 
20 usually not be advantageous due to the greater nutritional requirements needed to maintain 
this copy number. However, artificial conditions can be devised to select for high copy 
number. Odified cells having recombinant genomes are grown in rich media (in which 
conditions, multicopy number should not be a disadvantage) and exposed to a mutagen, such 
as ultraviolet or gamma irradiation or a chemical mutagen, e.g., mitomycin, nitrous acid, 
25 photoactivated psoralens, alone or in combination, which induces DNA breaks amenable to 
repair by recombination. These conditions select for cells having multicopy number due to 
the greater efficiency with which mutations can be excised. Modified cells surviving 
vv exposure to mutagen are enriched for cells with multiple genome copies. If desired, selected 

cells can be individually analyzed for genome copy number (e.g. , by quantitative 
30 hybridization with appropriate controls). For example, individual cells can be sorted using a 
cell sorter for those cells containing more DNA, e.g., using DNA specific fluorescent 
compounds or sorting for increased size using light dispersion. Some or all of the collection 
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of cells surviving selection are tested for the presence of a gene that is optimized for the 
desired property. 

In one embodiment, phage libraries are made and recombined in mutator 
strains such as cells with mutant or impaired gene products of mutS, mutT, mutH, mutL, 
4, 5 ovrD, dcm, vsr, umuC, umuD, sbcB, recJ, etc. The impairment is achieved by genetic 

mutation, allelic replacement, selective inhibition by an added reagent such as a small 
I compound or an expressed antisense RNA, or other techniques. High muhiplicity of 

infection (MOI) libraries are used to infect the cells to increase recombination frequency. 

Additional strategies for making phage libraries and or for recombining DNA 
! 1 0 from donor recipient cells are set forth in U.S. Pat. No. 5,52 1 ,077. Additional 
^i recombination strategies for recombining plasmids in yeast are set forth in WO 97 07205. 

6. Whole Genome Shuffling 

In one embodiment, the selection methods herein are utilized in a "whole 
I 1 5 genome shuffling" format. An extensive guide to the many forms of whole genome 
^ shuffling is found in the pioneering application to the inventors and their co-workers entitled 

"Evolution of Whole Cells and Organisms by Recursive Sequence Recombination," 
Attorney Docket No. 018097-020720US filed July 15, 1998 by del Cardayre et al (USSN 
09/161,188). 

20 In brief, whole genome shuffling makes no presuppositions at all regarding 

what nucleic acids may :onfer a desired property. Instead, entire genomes (e.g., from a 
genomic library, or isolated from an organism) are shuffled in cells and selection protocols 
applied to the cells. 

The fermentation of microorganisms for the production of natural ^./oducts is 
25 the oldest and most sophisticated application of biocatalysis. 

The methods herein allow monooxygenase biocatalysts to be imprc^ed at a 
faster pace than conventional methods. Whole genome shuffling can at least double the rate 
of strain improvement for microorganisms used in fermentation as compared to traditional 
methods. This provides for a relative decrease in the cost of fermentation processes. New 
30 products can enter the market sooner, producers can increase profits as well as market share, 
and consumers gain access to more products of higher quality and at lower prices. Further, 
increased efficiency of production processes translates to less waste production and more 
frugal use of resources. Whole genome shuffling provides a means of accumulating multiple 
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useful mutation per cycle and thus eliminate the inherent limitation of current strain 
improvement programs (SIPs). 

DNA shuffling provides recursive mutagenesis, recombination, and selection 
of DNA sequences. A key difference between DNA shuffling-mediated recombination and 
5 natural sexual recombination is that DNA shuffling effects both the pairwise (two parents) 
and the poolwise (multiple parents) recombination of parent molecules. Natural 
recombination is more conservative and is limited to pairwise recombination. In nature, 
pairwise recombination provides stability v/ithin a population by preventing large leaps in 
sequences or genomic structure that can result from poolwise recombination. However, for 
10 the purposes of directed evolution, poolwise recombination is appeahng since the beneficial 
mutations of multiple parents can be combined during a single cross to produce a superior 
offspring. Poolwise recombination is analogous to the crossbreeding of inbred strains in 
classic strain improvement, except that the crosses occur between many strains at once. In 
essence, poolwise recombination is a sequence of events that effects the recombination of a 
1 5 population of nucleic acid sequences that results in the generation of new nucleic acids that 
contains genetic information firom more than two of the original nucleic acids. 

There are a few general methods for effecting efficient recombination in 
prokaryotes. Bacteria have no known sexual cycle per se, but there are natural mechanisms 
by which the genomes of these organisms undergo recombination. These mechanisms 
20 include natural competence, phage-mediated transduction, and cell-cell conjugation. 

Bacteria that are naturally competent are capable of efficiently taking up naked DNA from 
the environment. If homologous, this DNA undergoes recombination with the genome of 
the cell, resulting in genetic exchange. Bacillus subtilis, the primary production organism of 
the enzyme indu'^try, is known for the efficiency with which it carries out this process. 
25 In generalized transduction, a bacteriophage mediates genetic exchange. A 

transducing phage will often package headfuUs of the host genome. These phage can infect 
a new host and deliver a fragment of the former host genome which is frequently integrated 
via homologous recombination. Cells can also transfer DNA between themselves by 
conjugation. Cells containing the appropriate mating factors transfer episomes as well as 
30 entire chromosomes to an appropriate acceptor cell where it can recombine with the acceptor 
genome. Conjugation resembles sexual recombination for microbes and can be intraspecific, 
interspecific, and intergeneric. For example, an efficient means of transforming 
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Streptomyces sp., a genera responsible for producing many commercial antibiotics, is by the 
conjugal transfer of plasmids from Echerichia coli. 

For many industrial microorganisms, knowledge of competence, transducing 
phage, or fertility factors is lacking. Protop« ..^L fusion has been developed as a versatile and 
5 general alternative to these natural methods of recombination. Protoplasts are prepared by 
removing the cell wall by treating cells with lytic enzymes in the presence of osmotic 

I stabilizers. In the presence of a flisogenic agent, such as polyethylene glycol (PEG), 

protoplasts are induced to fuse and form transient hybrids or ''fiisants." During this hybrid 
state, genetic recombination occurs at high freauency allowing the genomes to reassort. The 

' 10 final step is the successful segregation and regeneration of viable cells from the fused 

protoplasts. Protoplast fusion can be intraspecific, interspecific, and intergeneric and has 
been applied to both prokaryotes and eukaryotes. In addition, it is possible to fuse more than 
two cells, thus providing a mechanism for effecting poolwise recombination. While no 
fertility factors, transducing phages or competency development is needed for protoplast 

^ 1 5 fiision, a method for the formation, fusing, and regeneration of protoplasts is typically 

I 

^ optimized for each organism. 

Modifications can be made to the method and materials as hereinbefore 
described without departing from the spirit or scope of the invention as claimed, and the 
invention can be put to a number of different uses, including: 
20 The use of an integrated system to test monooxygenase in shuffled DNAs, 

including in an iterative process. 



7, Family Shuffling P450S 

For identification of homologous genes used in family shuffling strategies, 
25 representative alignments of P450 enzymes can be found in the Appendices of the volume 
Cytochrome P450: Structure, Mechanism, and Biochemistry, 2""* Addition (ed. by 
Paul R. Ortiz de Montellano) Plenum Press, New York, 1995) ("Ortiz de Montellano"). An 
I up-to-date list of P450s can be found electronically on the World Wide Web 

(http://dmelson.utmem.edu/homepage.html). 
30 To illustrate the family shuffling approach to improving P450 enzymes, one 

or more of the more than 1000 members of this superfamily is selected, aligned with similar 
homologous sequences, and shuffled against these homologous sequences. 
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For example, the gene for the bovine P450scc enzyme, CYPllAl, belongs to a 
family of closely related P450 genes. DNA family shuffling (Crameri et al. Nature 
391:288) can be used to create hybrid variants from these genes, variants of which can be 
screened for enhanced conversion of cholesterol to pregnenolone. 

The screening is done most easily in yeast, but a bacterial system could also 
be constructed by co-expressing the accessory electron transport proteins adrenodoxin and 
adrenodoxin reductase. DNA from clones with improved activity can be shuffled together in 
subsequent rounds of DNA shuffling and screened for further improvement. 

Subsequent steps in the biosynthesis of steroids such as cortisone and 
estradiol are also catalyzed by cytochrome P450 enzymes {,see, Ortiz de Montellano, chapter 
12.) For example, conversion of pregnenolone to Cortisol involves four enzymatic steps, 
three of which are catalyzed by cytochrome P450 enzymes. Each of these enzymes belongs 
to P450 gene families, which also are amenable to DNA family shuffling. 

One model P450 system has been developed by Pompon and co-workers 
{e.g., Duport et al. Nature Biotechnol. 16:186; Pompon et al. Methods Enzymol 272:51). 
In particular, they have developed a yeast strain that produces pregnenolone from galactose, 
and an additional strain that further converts pregnenolone to progesterone. One of the 
enzymes expressed in these strains is the bovine P450scc. Optimization of this strain, or of 
related processes useful for steroid production can be assisted by DNA shuffling of P450scc- 
Numerous other microbial expression systems for P450-type enzymes are known in the 
literature. 

8. Codon Modification Shuffling 

Procedures for codon modification shuffling are described in detail in 
SHUFFLING OF CODON ALTERED GENES, Phillip A. Patten and Willem P.C. Stemmer, 
filed September 29, 1998, USSN 60/102362 and in SHUFFLING OF CODON ALTERED 
GENES, Phillip A. Patten and Willem P.C. Stemmer, filed January 29, 1999, USSN 
60/1 17729. In brief, by synthesizing nucleic acids in which the codons encoding 
polypeptides are altered, it is possible to access a completely different mutational cloud upon 
subsequent mutation of the nucleic acid. This increases the sequence diversity of the starting 
nucleic acids for shuffling protocols, which alters the rate and results of forced evolution 
procedures. Codon modification procedures can be used to modify any nucleic acid 



38 



0009682A1_L> 



wo 00/09682 PCT/US99/18424 

described herein, e.g., prior to performing DNA shuffling, or codon modification approaches 
can be used in conjunction with oligonucleotide shuffling procedures as described supra. 

In these methods, a first nucleic acid sequence encoding a first polypeptide 
sequence is selected. A plurality of codon altered nucleic acid sequences, each of which 
5 encode the first polypeptide, or a modified or related polypeptide, is then selected (e.g., a 
library of codon altered nucleic acids can be selected in a biological assay which recognizes 
library components or activities), and the plurality of codon-altered nucleic acid sequences is 
recombined to produce a target codon altered nucleic acid encoding a second protein. The 
target codon altered nucleic acid is then screened for a detectable functional or structural 

10 property, optionally including comparison to the properties of die first polypeptide and/or 
related polypeptides. The goal of such screening is to identify a polypeptide that has a 
structural or functional property equivalent or superior to the first polypeptide or related 
polypeptide. A nucleic acid encoding such a polypeptide can be used in essentially any 
procedure desired, including introducing the target codon altered nucleic acid into a cell, 

15 vector, virus, attenuated virus (e.g., as a component of a vaccine or immunogenic 
composition), transgenic organism, or the like. 



P. Oligonucleotide and in silico shuffling formats 

In addition to the formats for shuffling noted above, at least two additional 

20 related formats are useful in the practice of the present invention. The first, referred to as "in 
silico" shuffling utilizes computer algorithms to perform "virtual" shuffling using genetic 
operators in a computer. As applied to the present invention, gene sequence strings are 
recombined in a computer system and desirable products are made, e.g., by reassembly PGR 
of synthetic oligonucleotides. In silico shuffling is described in detail in Selifonov and 

25 Stemmer in "METHODS FOR MAKING CHARACTER STRINGS, 

POLYNUCLEOTIDES & POLYPEPTIDES HAVING DESIRED CHARACTERISTICS" 
filed February 5, 1999, USSN 60/1 18854. In brief, genetic operators (algorithms which 
represent given genetic events such as point mutations, recombination of two strands of 
homologous nucleic acids, etc.) are used to model recombinational or mutational events 

30 which can occur in one or more nucleic acid, e.g., by aligning nucleic acid sequence strings 
(using standard alignment software, or by manual inspection and alignment) and predicting 
recombinational outcomes. The predicted recombinational outcomes are used to produce 
corresponding molecules, e.g., by oligonucleotide synthesis and reassembly PCR. 
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The second useful format is referred to as "oligonucleotide mediated 
shuffling" in which oligonucleotides corresponding to a family of related homologous 
nucleic acids {e.g., as applied to the present invention, interspecific or allelic variants of a 
dioxygenase nucleic acid) which are recombined to produce selectable nucleic acids. This 

5 format is described in detail in Crameri et al. "OLIGONUCLEOTIDE MEDIATED 
NUCLEIC ACID RECOMBINATION" filed February 5, 1999, USSN 60/118,813 and 
Crameri et al. "OLIGONUCLEOTIDE MEDIATED NUCLEIC ACID 
RECOMBINATION" filed June 24, 1999, USSN 60/141,049. The technique can be used to 
recombine homologous or even non-homologous nucleic acid sequences. 

J Q One advantage of the oligonucleotide-mediaicd recombination is the ability to 

recombine homologous nucleic acids with low sequence similarity, or even non-homologous 
nucleic acids. In these iow-homology oligonucleotide shuffling methods, one or more set of 
fragmented nucleic acids are recombined, e.g., with a with a set of crossover family diversity 
oligonucleotides. Each of these crossover oligonucleotides have a plurality of sequence 

1 5 diversity domains corresponding to a plurality of sequence diversity domains fi-om 
homologous or non-homologous nucleic acids with low sequence similarity. The 
fragmented oligonucleotides, which are derived by comparison to one or more homologous 
or non-homologous nucleic acids, can hybridize to one or more region of the crossover 

oligos, facilitating recombination. 
20 When recombining homologous nucleic acids, sets of overlapping family 

gene shuffling oligonucleotides (which are derived by comparison of homologous nucleic 
acids and synthesis of oligonucleotide fragments) are hybridized and elongated (e.g., by 
reassembly PCR), providing a population of recombined nucleic acids, which can be 
selected for a desired trait or property. Typically, the set of overlapping family shuffling 
25 gene oligonucleotides include a plurality of oligonucleotide member types which have 

consensus region subsequences derived from a plurality of homologous target nucleic acids. 

Typically, family gene shuffling oligonucleotide are provided by aligning 
homologous nucleic acid sequences to select conserved regions of sequence identity and 
regions of sequence diversity. A plurality of family gene shuffling oligonucleotides are 
30 synthesized (serially or in parallel) which correspond to at least one region of sequence 
diversity. 

Sets of fragments, or subsets of fragments used in oligonucleotide shuffling 
approaches can be provided by cleaving one or more homologous nucleic acids (e.g., with a 
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DNase), or, more commonly, by synthesizing a set of oligonucleotides corresponding to a 
plurality of regions of at least one nucleic acid (typically oligonucleotides corresponding to a 
mil-length nucleic acid are provided as members of a set of nucleic acid fragments). In the 
shuffling procedures herein, these cleavage fragments (e.g.. fragments of monooxygenases) 
can be used in conjunction with family gene shuffling oligonucleotides, e.g., in one or more 
recombination reaction to produce recombinant monooxygenase nucleic acids. 



10 



] 0. Chimeric shuffling templates 

In addition to the naturally occurring, mutated and synthetic oligonucleotides 
discussed above, polynucleotides encoding chimeric polypeptide can be used as substrates 
for shuffling in any of the above-described shuffling formats. Nucleic acids encoding 
chimeras prepared by art-recognized are encompassed herein. Art-recognized methods for 
preparing chimeras are applicable to the methods described herein (see, for example, Shimoji 
a 15 et al. Biochemistry 37: 8848-8852 (1998)). 

I Thus, in another embodiment, the invention provides a chimeric 

monooxygenase polynucleotide shuffling template. Preferred templates are derived from the 
P-450 superfamily of monooxygenases. 

Cytochrome P450 constitutes a super family of over 1000 members. These 
20 proteins are grouped based on their heme prosthetic group and aligrnnents. The sequence 
identity between the various P450 families is quite low, but the protein three dimensional 
folds are very similar. Hence alignments can easily be made between P450's using multiple 
sequence alignment tools such as clustal, DIALIGN, FASTA, MEME, and Block Maker. If 
a number of programs are used, a consensus aligmnent is evident, especially around critical 
25 residues such as the cysteine bound to the heme. 

There are four P450 crystal structures known, P450 -cam, -terp, -eryF and- 
BM-P, and they all show similar architecture. Although all of the known crystal structures 
i are for bacterial P450, when alignments are done to mammalian enzymes, predictions about 

the active site pockets and residues can be made. Site directed mutation studies based upon 
this scheme have experimentally verified the importance of the predicted residues in 
substrate binding (Gotoh, J. Biol. Chem. 267:83-90) describes amodel of CYP 2C9, based 
on P450cam. which others have used and verified. For use of the BM-P structure to 
model/mutate CYP 4A proteins, see, J. Biol. Chem. Sep 4; 273(36):23055-61 (1998). 
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In another aspect, the invention provides a method of obtaining a 
polynucleotide that encodes a recombinant P450 polypeptide comprising a backbone domain 
and an active site domain. The method involves: (a) recombining at least first and second 
forms of a nucleic acid that encodes a P450 active site domain, wherein the first and second 
5 forms differ from each other in two or more nucleotides to produce a library of recombinant 
active site domain encoding polynucleotides; and (b) linking the recombinant active site 
domain-encoding polynucleotide to a backbone-encoding polynucleotide so that the active 
site-encoding domain and the backbone-encoding domain are in-frame. 

In yet another aspect, the invention provides a method of obtaining a 

1 0 polynucleotide that encodes a recombinant P450 polypeptide comprising a backbone domain 
and an active site domain. The method involves: (a) recombining at least first and second 
forms of a nucleic acid that encodes a P450 backbone domain, wherein the first and second 
forms differ from each other in two or more nucleotides to produce a library of recombinant 
backbone domain encoding polynucleotides; and (b) linking the recombinant backbone 

15 domain-encoding polynucleotide to a active site-encoding polynucleotide so that the 
backbone-encoding domain and the active site-encoding domain are in-frame. 

In a still further aspect, the invention provides a method of obtaining a 
polynucleotide that encodes a recombinant P450 polypeptide comprising a backbone domain 
and an active site domain. The method involves: (a) recombining at least first and second 

20 forms of a nucleic acid that encodes a P450 active site domain, wherein the first and second 
forms differ from each other in two or more nucleotides to produce a library of recombinant 
active site domain encoding polynucleotides; (b) recombining at least first and second forms 
of a nucleic acid that encodes a P450 backbone domain, wherein the first and second forms 
differ fi-om each other in two or more nucleotides to produce a library of recombinant 

25 backbone domain encoding polynucleotides; and (c) linking the recombinant active site 
domain-encoding polynucleotide to the recombinant backbone-encoding polynucleotide so 
that the recombinant active site-encoding domain and the recombinant backbone-encoding 

domain are in-frame. 

The linking of the various nucleic acids in each of the above aspects can be 
30 accomplished by methods well-known in the art. Moreover, in each of the above aspects, 
certain embodiments are presently preferred. For example, in a preferred embodiment, the 
backbone P450 (BM-P in this example) refers to the C-terminus of the protein which 
contains the proximal cysteine (residue 400) ligand to the prosthetic heme. The N terminus 
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of the desired P450 isozyme is transferred onto this structure. In a preferred embodiment the 
junction between the two sequences occurs at an end of the I helix (e,g., residue 282). In 
another preferred embodiment the junction between the two proteins occurs in the G-H loop 
(residues 227-232 preferably). In another preferred embodiment solely the F and G helices 

^ 5 (residues 1 7 1 -226) are transferred into the backbone P450 with the remaining sequence 

being from the backbone P450. 

|| Using the above methods, chimeric monooxygenases having optimized 

i activities can be obtained. The activities that are optimized include any of the activities 

towards any of the substrates described herein. 
10 Generating a focused P450 library of chimeras, steroid hydroxylases for 

example, typically begins with an investigation of the literature, especially the drug 
metabolism area, for isozymes known to catalyze the desired chemistry. Once identified, 
these isozymes are aligned, using the relevant programs, to one of the P450's with a known 
x-ray structure(P450 -cam, -terp, -eryF and -BM-P), preferably BM-P. Once the alignment 

^ 1 5 is achieved, the putative active site regions are generated and isolated for further study. 

M 

}^ Inspection of the published structures for P450's {see, for example P.N.A,S. 

96: 1863-1868 (1999); Nature Struct. Biol. 4: 140-146 (1997)) and structure function studies 
{see, for example. Drug Metab, Dispos. 26: 1223-123 1 (1998), for a review) and are used to 
highlight the sites at which chimeras are preferably constructed. For the purpose of clarity, 
20 all residue numbers refer to an exemplary sequence, CYP 102 P450 BM-P. This focus is not 
intended to limit the invention as it is apparent that it is the positions in the structural motif 
of the protein that are relevant not the absolute residue number. The positions of the 
structural motifs may be determined by methods including crystal structure determination, 
sequence alignment and homology modeling. Indeed a small extension of the sequence 
25 beyond the chosen region may be transferred into the chimera. 
^- The method provides a senes of chimeric nucleic acids which include 

sequences, chosen as described above, from the P450 isozymes known to catalyse the 
I desired chemistry and the remainder of a soluble bacterial P450, preferably one of the 

structurally defined P450s, most preferably P450BM-P, most preferably still an already 
30 improved chimeric monooxygenase nucleic acid. These chimeric nucleic acids can be used 
as substrates for shuffling in any of the above-described shuffling formats. 

In one embodiment the entire polynucleotide is improved by shuffling. In a 
preferred embodiment, the heme domain of the P450 component of the chimera is shuffled. 
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In another preferred embodiment the active site region of the P450 isozymes is shuffled. In 
yet another preferred format the active site sequences described above are shuffled before 
chimera formation. In this format the improved nucleic acids are cloned into the P450 
backbone to create a library of improved monooxygenases 
5 In another preferred format, one or more of the desired P450 isozyme active 

sites are not transformed into a chimeric nucleic acid. The diversity encoded by these 
sequences are captured by the inclusion of oligonucleotides encoding the sequence of 
interest as described in the above-described shuffling format. 

One advantage of this process is that the formation of chimeric P450 
10 nucleotides allows the production of polypeptide encoding any P450 activity in the same 
system. Thus the creation of an improved nucleic acid with one activity may start from a 
previously improved chimeric nucleic acid encoding a different activity. This recursive 
synergy leads to rapid improvement of the monooxygenase nucleic acid for any and all of 
the desired properties. 

1 5 Another advantage of this process is the improvement in stability and ease of 

expression of polypeptides with the activity of a eukaryotic, membrane associated, P450 as a 
soluble bacterial protein. This leads to significant improvement in the expression level, 
stability, and ease of handling of any polypeptide encoded by the improved nucleic acid. 

A third advantage of this process is the ability to create improved nucleic 

20 acids for a particular activity without isolation of the nucleic acid encoding that activity. 
Each chimeric nucleic acid will be expressed and screened in substantially similar fashion 
for any of the reactions described herein. 

Thus any reaction described in the literature of biotransformation and drug 
metabolism and known to those skilled in the art, such as those described herein, encoded by 

25 a P450 nucleic acid can be performed by a chimeric nucleic acid of the type described. 

B. Reactions of Improved Monooxygenases 

In another aspect, the invention provides a method for obtaining a 
polynucleotide encoding an improved polypeptide acting on a substrate comprising a target 
30 group selected from an olefin, a terminal methyl group, a methylene group, an aryl group 
and combinations thereof. The improved polypeptide exhibits one or more improved 
properties compared to a naturally occurring polypeptide acting on said substrate. The 
method includes: (a) creating a library of recombinant polynucleotides encoding a 
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monooxygenase polypeptide acting on said substrate; and (b) screening said library to 
identify a recombinant polynucleotide encoding an improved polypeptide that exhibits one 
or more improved properties compared to a naturally occurring monooxygenase polypeptide. 

In a preferred embodiment, the library of recombinant polynucleotides is 
5 created by recombining at least a first form and a second form of a nucleic acid. At least one 
of these forms encodes the naturally occurring polypeptide or a fragment thereof. 
Preferably, the first form and said second form differ from each other in two or more 
nucleotides. In a further preferred embodiment, the first and second forms of the nucleic 
acid are homologous. 

10 In addition to the methods described above for producing the encoding 

polynucleotides, the present invention also provides the polypeptides encoded by these 
polynucleotides and methods using these peptides for synthesizing valuable organic 
compounds. Some of these polypeptides and methods of using them are set forth below. 

It is noted that the basic chemistry described below with reference to 
1 5 monooxygenases is known. In addition to Ortiz de Montellano, supra, a general guide to the 
various chemistries involved is found in Stryer (1988) Biochemistry, third edition (or later 
editions) Freeman and Co., New York, NY; Pine et al ORGANIC CHEMISTRY, FOURTH 
Edition (1980) McGraw-Hill, Inc. (USA) (or later editions); March, Advanced Organic 
Chemistry Reactions, Mechanisms and Structure, 4th ed, J. Wiley and Sons (New York. 
20 NY, 1 992) (or later editions); Greene, et al , Protective Groups In Organic Chemistry, 
2nd Ed., John Wiley 8c Sons, New York, NY, 1991 (or later editions); Lide (ed) The CRC 
Handbook of Chemistry and Physics 75th edition (1995)(or later editions); and in the 
references cited in the foregoing. Furthermore, an extensive guide to many chemical and 
industrial processes applicable to the present invention is found in the Kirk-Othmer 
25 Encyclopedia of Chemical Technology (third edition and fourth edition, through year 
1998), Martin Grayson, Executive Editor, Wiley-Interscience, John Wiley and Sons, NY, 
and in the references cited therein ("Kirk-Othmer")- 

The following chemistries illustrate those generally accessible through the 
heme-dependent P450 monooxygenase/peroxidase superfamily. Certain useful reaction 

30 types are set forth in Fig 1. 

Family shuffling approaches apply to enhancing performance of 
monooxygenase polypeptides useful in each of the following classes of industrial chemical 
transformation. Other monooxygenase enzyme classes are also useful in practicing the 
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present invention. Moreover, other polypeptides accessible through the present invention, 
and method of using these polypeptides will be apparent to those of skill in the art. 

L Oxidation of K-bonds to epoxides 
5 Among the most high-value classes of commodity chemical transformations 

is the catalytic epoxidation of terminal olefins to corresponding epoxides. Indeed, ethylene 
oxide, propylene oxide, epichlorohydrin, glycidol, butylene oxide and bis-A-diglycidyl 
ethers and their immediate downstream derivatives account for a significant fraction of the 
entire $350 B/yr global chemical industry. Typically, prior art P450 activities are limited by 
10 low turnover number, low affinity, low stability under the conditions of interest and/or 

enzyme inactivation by alkylation or free-radical-dependent mechanisms. Moreover, such 
chemistry is often associated with rapid inactivation of the heme-dependent enzyme. Family 
shuffling approaches to enzyme improvement are used to markedly reduce the sensitivity of 
the monooxygenases to this mode of inactivation. 
15 In a preferred embodiment, the present invention provides an improved 

polypeptide that is capable of converting an olefin into an epoxide. Moreover, there is 
provided a method for converting an olefin to an epoxide. The method includes contacting 
the olefin substrate with the polypeptide. In a still further preferred embodiment, the 
substrate is contacted with an organism that expresses the polypeptide. 
20 In another preferred embodiment, the polypeptides are those encoded by 

monooxygenase genes that can be recruited and optimized by DNA shuffling. A range of 
monooxygenases known in the art provide appropriate starting points for determining a 
polypeptide useful in this aspect of the invention. One usefial class of monooxygenases is 
exemplified by the heme-dependent eukaryotic and bacterial cytochrome P-450 
25 Heme-containing enzymes of the P450 family exhibit a wide array of 

catalytic activities of mterest in the context of metabolizing xenobiotics and environmental 
and biochemical waste products. Of the diverse chemistries catalyzed by this class of 
enzymes, a number are of industrial chemical interest. 

As an enzyme class, the P450 family exhibits notable activities toward many 
30 classes of compounds. For example, in the presence of oxygen and an intact redox recycle 
system, P450s exhibit monooxygenase activity. Addition of hydrogen peroxide or other 
peroxides, however, can be used to circumvent the NAD(P)H requirement (/.e. allowing for 
peroxidase activity) toward many of the same substrates. 
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In a further preferred embodiment, polypeptides based on, or analogous to, 
non-heme-dependent monooxygenases are used to effect epoxidation of olefins. Such 
monooxygenases include, but are not limited to, non-heme monooxygenases involved in the 
bacterial degradation of styrene by bacteria (as exemplified by the genes and enzymes 
described by Marconi etal.. Appl. Environ. Microbiol. 62(1):121-127 (1996); Beltrametti et 
al. Appl Environ. Microbiol. 63(6):2232-2239 (1997); O'Connor era/.. Appl. Environ. 
Microbiol. 63(11):4287-4291 (1997); Velasco etal. J. Bacteriol. 180(5): 1063-1071 (1998); 
Itoh et al, Biosc. Biotechnol Biochem. 60(1 1):1826-1830 (1996)), or in the degradation of 
methyl-substituted aromatic compounds such as toluene, xylenes, p-cymene (exemplified by 
xylene monooxygenase, Wubbolts et al. Enzyme Microb. Technol. 16(7):608-615 (1994)). 

The followdng is a non-limiting list of exemplary monooxygenase genes 
which can be recruited and optimized by DNA shuffling for the purpose of epoxidizing 
olefins: 

[AP031161] styrene monooxygenase (epoxide-forming) of Pseudomonas sp. 
VLB 120, stdA, stdB; [PFSTYABCD] styrene monooxygenase of?. 
fluorescens (styA, styB); [PSSTYCATA] styrene monooxygenase of 
Pseudomonas sp.; [PSEXYLMA, AF019635, D63341, E02361] 
xylene/toluene monooxygenase of Pseudomonas putida TOL plasmid (xyl 
M, xylA); [PPU24215]/>-cymene monooxygenase of P. putida; 
[PSETBMAF] tolueneA5enzene-2-monooxygenase (tbmA-tmmF) of 
Pseudomonas sp.; [PPU04052] toluene-3 -monooxygenase of Pseudomonas , 
pickettii PKOl; [AF001356] toluene-S -monooxygenase of Burkholderia 
cepacia; and [AF043544] nitrotoluene monooxygenase of Pseudomonas sp. 

TW3, NtnMA (ntnM, ntnA). 

A variety of strains knovm to contain monooxygenases capable of epoxide 
formation are known. For example, Pseudomonas aeruginosa is known to have a 
monooxygenase capable of epoxidizing 1 -octene to 1 ,2-epoxy octane. The most 
comprehensive studies on bacterial alkene epoxidation have been done on Pseudomonas 
oleovorans. Work on P. oleovorans by May and coworkers (J. Biol Chem. 248:1725-1730, 
1973) shows that the monooxygenase contained in the cells is capable of epoxidizing octene 
to 1,2-epoxy-octane in 70% enantiomeric purity. In addition, this enzyme is capable of 
converting 1 ,7-octadier.e to the diepoxide (May et al. J. Am. Chem. Soc. 98:7856-7858) and 
1,5-hexadiene and 1,11-dodecadiene to epoxides. However, smaller alkenes are often 
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converted to alcohols. Cells grown up overnight under standard conditions can be used 

intact or as lysates— and, in both cases, have been observed to give yields of ~ 1 g/L. 

Increasing the rate of accumulation of the reactive epoxide is clearly one of the preferred 

objectives of gene shuffling as set forth herpin. 
5 This enzyme system is also capable of mediating hydroxylation of longer 

chain alkanes (octanes, etc.) and fatty acids. The enzyme has been cloned and sequenced 
I and is included of three protein components: rubredoxin (mw 1 9,000), NADH-rubredoxin 

1 reductase, and the hydroxylase (a non-heme iron protein). V.Tiereas there are scenarios (such 

as when overall stability of the system is an issue) in which shuffling of the genes for all 
■ 1 0 three protein components is preferred, when the primary improvement is related to the 

kinetics, affinity or inhibition profile of the monooxygenase, the preferred shuffling strategy 

will be to shuffle homologs of the hydroxylase (epoxygenase) component. 

Microorganisms having MO enzyme activities with similar properties include 

the genera Rhodoccous. Mycobacterium, Nocardia (Nocardia carollina B-276) and 
1 5 Pseudomonas Corynebacterium equi (IFO 3730), which can be grown on n-octane and 
I which exhibit the capacity to oxidize 1-hexene to optically pure ( R )- (+)-epoxide. This 

strain also assimilates other terminal olefins and converts them to epoxides. Yields decrease 

to <1% with carbon chains of >14. Increasing the activity of the enzyme toward longer 

chain length alkenes is a target for evolving additional catalysts for chirally selective 
20 epoxidations. Such monomers have high value as pharmaceutical and agricultural 

intermediates. 

Experiments with Pseudomonas putida, Nocardia corallina B-276 and 
Bacillus megaterium, suggest that the monooxygenase activity of these organisms derives 
from a soluble P450-dependent system. All of these strains are available from ATCC and 
25 serve as exemplary sources for the genes which can be isolated by hybridization and gene 

s amplification methods. 

Mycobacterium sp (E20) and Mycobacterium sp. (Py 1) show activity even 
A toward short-chain, gaseous olefins such as ethylene. In the case of both ethylene and 

' propylene, the epoxide products are formed almost exclusively. Catalyst performance 

30 experiments are performed in a gas-solid reactor to prevent accumulation of toxic ethylene 
oxide in the immediate vicinity of the biocatalyst. An experimental set-up which allows for 
automatic gas chromatography analysis of circulation gas in a batch reactor system and 
^ allows for online monitoring of the microbial (or enzymatic) oxidation of gaseous alkenes 
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(ethylene, propylene and butylene). Optimization of the process is achieved by studying the 
influence of various organic solvents and physical conditions on retention of immobilized 
cell/enzyme activity. 

High activity retention is favored by low polarity, high molecular weight 
solvents; although this is also selectable following DNA shuffling as well. Using chiral gas 
chromatography, wild type (wt) strains and strains containing candidate evolved 
polypeptides are screened with respect to the stereospecificity of the epoxidation of propene, 
1-butene and 3-chloro-l -propene. Results show that a wide range of chiral selectivity or 
nonselectivity emerge from a typical series of family shuffling and screening experiments. 
Novel polypeptides, favoring the S, rather than the R stereoisomer can also be shuffled and 
selected. Inactivation of the alkene epoxidation system by the produced epoxide has been 
one of the key historical limitations of the system. Again, gene and family shuffling 
combined with appropriate selection methods and screens are used to identify polypeptides 
with improved stability in the presence of epoxide products. 

A number of other methane-grown methylotrophic bacteria {Methylosinus 
trichosporium. Methylobacterium capsulatus and Methylobacterium organophilum) have all 
been shown to contain a methane monooxygenase (MMO) system analogous to the well- 
characterized Pseudomonas oleovorans system. Again, standard hybridization and gene 
amplification methods provide a straightforward approach to isolate those genes which are 
not yet reported in the literature. Sequences of MMOs from some of these organisms are 
known and can be obtained from the public sequence Databases such as Genbank, Entrez®, 
and others. 

Moreover, one species of Rhodococcus rhodochrous has been shown ta be 
capable of oxidizing propane and propene to epoxide and hydroxylated products without 
inhibition by the products. The unique monooxygenase from this organism provides an 
important material to incorporate in family shuffling formats to expand activity of shuffled 
nucleic acids. 

2. Hydroxylation of organic substrates 

In another embodiment, the present invention provides a monooxygenase 
polypeptide capable of hydroxy lating organic substrates. In an exemplary embodiment, the 
polypeptide oxidizes a methyl or a methylene group. In a preferred embodiment, the 
polypeptide oxidizes a terminal methyl group to a hydroxymethyl group. In yet another 
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preferred embodiment, the invention provides an improved monooxygenase polypeptide that 
acts on a methylene group to form a secondary alcohol. Preferred organic substrates include 
a target group selected from arylmethyl, substituted arylmethyl, arylmethylene, substituted 
aryimethylene, heteroarylmethyl, substituted heteroarylmethyl, alkyl-terminal methyl, fatty 
5 acid, terpenes and combinations thereof. The improved polypeptide is prepared using the 
methods of the invention and exhibits one or more improved properties compared to a 
naturally occurring polypeptide. 

In addition to the polypeptide, there is provided a method for converting a 
terminal methyl or internal methylene into the corresponding alkyl hydroxy group. The 
10 method includes contacting the substrate with the polypeptide. In a still further preferred 
embodiment, the substrate is contacted vnth an organism that expresses the polypeptide. 

P450s mediate the conversion of many of the molecular species listed above, 
including oxidation of toluene to form benzyl alcohol and oxidation of 2-phenyl-propane to 
2-pheny-l-propanol. Monooxygenase enzymes from Pseudomonas gladioli, Aspergillis 
1 5 niger and other species are knovwi to oxidize monoterpenes as well as higher terpenes. 

Conversion of monoterpenes to terminal unsaturated alcohols (without disruption of alkene 
fiinctionalities) is a remarkable aspect of monooxygenase mediated conversions (see, 
Enzyme Catalysis in Organic Synthesis, Vol. II, Chapter B.6.1.4 (ed. By K. Drauz and 
H. Waldmann, VCH Publishers, Inc., 1995). The powerful monooxygenase system of 
20 Pseudomonas oleovorans is also known to transform linear and branched-chain alkanes to 
alcohols, aldehydes, acids and hydroxy acids. 

Members of the P450 superfamily typically favor formation of primary 
alcohols. An example of a P450-mediated hydroxylation of interest is the co and co- 1 
hydroxylation of fatty acids, such as lauric acid. P450s such as CYP2B4, CYP2B1 and 
25 related sequences demonstrate this activity toward a number of hydrocarbon substrates. 
Shuffling members of this subfamily leads to polypeptides with altered specificity and 
enhanced stability. 

Many polypeptides capable of arylmethyl group oxidation are well knovm in 
the art. For example, the introduction of oxygen into methyl groups and methylene groups is 
30 mediated by non-heme multicomponent monooxygenases of toluene, xylenes and;7-cymene- 
While much of the discussion above focuses on constructing polypeptides and 
pathways for oxidation of arylmethyl compounds, this discussion is also directly applicable 
to polypeptides and pathways for oxidizing terminal methyl and internal methylene groups 
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of both alkyl and aryl-substituted alkyl groups. In a preferred embodiment, the substrate is 
an aryl-substituted alkyl group {see. Fig. 2). 

This step is accomplished by recruiting one or more genes encoding an 
appropriate monooxygenase activity. In a preferred embodiment, this is accomplished by 
shuffling and expressing a suitable cytochrome P450 type enzyme system. The enzymes of 
this class are ubiquitous in nature, and they can be found in a variety of organisms. For 
example, n-propylbenzene is known to undergo a-oxidation in strains of Pseudomonas 
desmolytica S449B1 and Pseudomonas convexa S107Bl(Jigami et al. Appl. Environ. 
Microbiol. 1979 38(5):783-788). 

Similarly, alkane monooxygenases of bacterial origin, or cytochromes P450 
for camphor oxidation, whether wild-type or mutant, can be recruited for the purpose of 
introducing the oxygen into the terr^-'^l methyl group of alkylaryl compounds, wherein the 
alkyl group is generally other than a methyl group (Lee et al, Biochem. Biophys. Res. 
Commun.; 218(1):17-21 (1996); van Beilen et ai. Mol. Microbiol.; 6(21):3121-3136 (1992); 
Kok etal.. J. Biol. Chem. 264(10):5435-5441 (1989); Koke/a/.. J. Biol. Chem. 
264(1 0):5442-5451 (1989); Loida and Sligar, Protein Eng. 6(2):207-212 (1993)). 
Furthermore, the mammalian metabolic pathways for these and structurally related 
alkylaromatic hydrocarbons indicate a cytochrome P450 dependent chiral oxidation of the 
terminal methyl group and subsequent oxidation to corresponding 2-arylpropanoic or 2- 
arylacetic acids, indicating that these P450s are excellent shrffling substrates (Matsumoto et 
al., Chem. Pharm. bull. (Tokyo) 40(7): 1721-1726 (1992); Matsumoto etal.. Biol. Pharm. 
Bull. 17(11):1441-1445 (Nov 1994); Matsumoto etal, Chem. Pharm. Bull (Tokyo) 
43(2):2l6-222 (1995); Ishida and Matsumoto, Jte«ofefo/ica 22(11): 1291-1298 (1992)). 

Examples of monooxygenase genes suitable for use in the construction of 
strains for oxidation of the methylarenes include: 

[PSEXYLMA, AF019635, D63341, E02361] xylene/toluene monooxygenase 
of Pseudomonas putida TOL plasmid (xyl M, xylA); [PPU2421 5] p-cymene 
monooxygenase of?, putida; [AF043544] nitrotoluene monooxygenase of 
Pseudomonas sp. TW3, NtnMA (ntnM, ntnA); [SMU40233 and SMU40234] 
alkane monooxygenase of Stenotrophomonas maltophilia; [POOCT] alkane 
monooxygenase of Pseudomonas oleovorans TF4-1L (+OCT) plasmid, alk 
genes; and camphor 5-monohydroxylase of P.putida (CAM pl-smid) 
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Alternatively, for the purpose of using of non-heme-dependent oxidation of 
the arylalkyl compounds, useful monooxygenases are exemplified by a variety of non-heme 
monooxygenases involved in the bacterial degradation of styrene by bacteria (as exemplified 
by the corresponding genes and enzymes described by Marconi, et al, AppJ Environ. 
5 Microbiol, 62(1):121-127 (1996); Beltrametti, et ai, AppL Environ. Microbiol 63(6):2232- 
2239 (1997); O'Connor, etal, Appl Environ, Microbiol, 63(1 1):4287-4291 (1997); Velasco, 
i et ai, J. BacterioL 180(5):1063-1071 (1998); Itoh, et al, BioscA Biotechnol. Biochem, 

^- 60( 11): 1826-1 830 (1996)); or in the degradation of methyl-substituted aromatic compounds 

such as toluene, xylenes, p-cymene (exemplified by xylene monooxygenase, Wubbolts, et 
10 al„ Enzyme. Microb. Technol. 16(7):608-615 (1994)). 

Exemplary non-heme monooxygenases useful in practicing the present 

invention include: 

[AF031161] styrene monooxygenase (epoxide-forming) of Pseudomonas sp. 
VLB 120, stdA, stdB, [PFSTYABCD] styrene monooxygenase (epoxide- 
;v 1 5 forming) of P. fluorescens (styA, styB); [PSST YC ATA] styrene 

'i, monooxygenase (epoxide-forming)of Pseudomonas sp; [PSEXYLMA, 

AF019635, D63341, E02361] xylene/toluene monooxygenase of 
Pseudomonas putida TOL plasmid (xyl M, xylA); [PPU24215] ;7-cymene 
monooxygenase of P. putida; [PSETBMAF] tolueneA)enzene-2- 
20 monooxygenase (tbmA-tmmF) of Pseudomonas sp.; [PPU04052] toluene-3- 

monooxygenase of Pseudomonas pickettii PKOl; [AF001356]; toluene-3- 
monooxygenase of Burkholderia cepacia; [AF043544] nitrotoluene 
monooxygenase, of Pseudomonas sp. TW3, NtnMA (ntnM, ntnA). 

25 3. Aromatic hydroxylation 

^ Hydroxylated aromatic compounds are an important group of industrial 

chemicals. Carboxylic acids, esters and lactones of hydroxylated aromatic compounds are of 
> particular value and interest. Thus, in another preferred embodiment, the invention provides 

an improved monooxygenase polypeptide that can oxidize an aryl compound to a 
30 hydroxyaryl compound (Fig. 1). Additionally, there is provided a method utilizing an 
improved monooxygenase polypeptide to effect the transformation of an aryl group to a 
heteroaryl group. The method includes contacting a substrate comprising an aryl group with 
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the polypeptide. In yet another preferred embodiment, the substrate is contacted with an 

organism that expresses the polypeptide. 

Presently preferred substrates include, for example, aryl groups, substituted 

aryl groups, heteroaryl groups and substit^^^d heteroaryl groups. Compounds representative 

5 of these generic groups include industrially significant substrates such as biphenyl, ben2-[a]- 

pyrene, aniline, toluene, naphthalene, cumene, haloaromatics and phenanthrene. 

Many monohydroxy aromatic compounds can be generated by using heme- 
's 

' and/or non-heme-containing type monooxygenases. To be useful in the biotransformation 

pathway, preferred polypeptides will have a sufficiently high turnover rate and they will not 
10 be readily deactivated in the presence of the substrates, intermediates or products of the 
.V oxidation reaction. This characteristic is an ideal candidate for improvement by the shuffling 

process disclosed herein. 

This class of reactions includes, for example, the modification of such 
industrially significant substrates as benzene, biphenyl, ben2-[a]-pyrene, aniline, toluene, 
^ 15 naphthalene, cumene, haloaromatics and phenanthrene are ail of considerable industrial 

^ chemical importance and are all carried out by members of the P450 superfamily. 

4. S-dealkylation of alkylsulfur compounds 

S-Dealkylation of reduced thio-organics, such as oxidation of parathion can 
20 be mediated by the use of improved monooxygenases. Sulfoxidation of numerous 
organosulfur compounds is also observed and can be enhanced by shuffling 
monooxygenases. Thus, in another preferred embodiment, the invention provid^is an 
improved monooxygenase polypeptide that can oxidize a penicillin G to penicillin G S- 
oxide, a key intermediate in the synthesis of cephalosporins. 

25 

5. O'Dealkylation of alkyl ethers 

Whereas S and N-alkyl groups are oxidized by monooxygenases to the 
I corresponding oxides, the electronegativity of oxygen dictates a different mechanistic 

pathway, namely rearrangement of the 0-alkyl bond. Synthetic pathways utilizing this 
30 reaction motif can be improved by shuffling monooxygenases. 
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6. Oxidation of aryloxy phenols 

Monooxygenase mediated reactions such as tlie conversion of p(p- 
nitrophenoxy)phenol to quinone can be enhanced by shuffling monooxygenases. 

5 

7. Dehydrogenation 

In some cases, the monooxygenase polypeptides of the invention operate as 
dehydrogenases rather than as oxygenases or peroxidases. For example, conversion of 
saturated hydrocarbons to unsaturated, conversion of alcohols to aldehydes, carboxylic acids 

1 0 and ketones, conversion of aldehydes to carboxylic acids and the desaturation of nitrogen 
compounds has been observed. A classic example of this is the conversion of 
dihydronaphthalene to naphthalene. Conversion of valproic acid to 2-n-propyl-pentenoic 
acid also illustrates this chemistry as does conversion of lindane (1,2,3,4,5,6-hexachloro- 
cyclohexane) to hexachlorocyclohexene. Numerous other examples of this classic P450 

1 5 chemical transformation exist, such as conversion of acetaldehyde or propionaldehyde to 
acetic and propionic acid, respectively. The CYP2C29 enzyme, for example, converts 
aliphatic alpha-beto unsaturated aldehydes (and anthraldehyde) to the corresponding acids. 
Shuffling of these and related P450s provides improved properties, such as enhanced 
activity, specificity and/or P450 stability. 

2Q Moreover, P450-based dehydrogenation chemistry also plays an important 

role in the biosynthesis of various steroids, and is, therefore, of considerable commercial 
interest in synthesizing steroid-based pharmaceuticals such as Cortisol and other steroidal 

anti-inflammatory agents. 

Thus, in another embodiment, the present invention provides a method for 
25 obtaining a nucleic acid encoding an improved monooxygenase polypeptide having 

dehydrogenase activity. In a preferred embodiment, the improved polypeptide acts on a 
substrate to dehydrogenate a hydroxyalkyl group to a member selected from: 

— COOH,and— C(0)H. 
Preferred substrates include members selected firom the group of aryhnethyl, 
30 substituted arylmethyl, heteroarylmethyl, substituted heteroarylmethyl, alkyl-terminal 
methyl, substituted alkyl-terminal methyl, and the like, as well as combinations thereof. 
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The improved polypeptide of the invention exhibits one or more improved 
properties compared to a naturally occurring polypeptide. Producing the polypeptide by the 
method of the invention involves creating a library of recombinant polynucleotides encoding 
a polypeptide acting on the substrate; and screening the library to identify a recombinant 
5 polynucleotide encoding the improved polypeptide. 

Moreover, there is provided a dehydrogenase polypeptide prepared by the 
method of the invention. A method for utilizing this polypeptide to oxidize a hydroxyalkyl 
group using the polypeptide is also provided. The method involves contacting a substrate 
having a hydroxyalkyl group with a polypeptide of the invention, more preferably with an 
1 0 organism expressing a polypeptide of the invention. 

8. Decarbonylation 

Examples of this important chemistry include conversion of 
cyclohexanecarboxaldehyde to cyclohexane and formic acid. Conversion of 

1 5 isobutyraldehyde, trimethylacetaldehyde, isovaleraldehyde, 2-methyl-butyraldehyde, 

citronellel and 2-phenyl-propionaidehyde to their corresponding decarbonylated products are 
also observed. This chemistry is not observed with unbranched aldehydes such as 
propionaldehyde and valeraldehyde. This is an important class of catalytic chemistry not 
easily duplicated abiotically. CYP2B4 is a preferred target for shuffling to improve the 

20 native activity of this P450. Shuffling of this family of P450 MOs results in polypeptides 
with activity toward unbranched aldehydes such as adipaldehyde, valeraldehyde r- id/or 
propionaldehyde. 

10, Oxidative dehalogenation of haloaromatics and halohydrocarbons 
25 Exemplary substrates for these reaction include, polychlorobenzenes, 

trichloroethylene, di and trichloro propane, 1,2 dichloroethane and 1,2 1,3 and 1,4 
dihydroketones. 

IL Baeyer-Villigermonoxygenation 
30 This reaction involves the oxidation of aromatic, open-chain and cyclic 

ketones to esters and lactones. 
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12. Exemplary embodiments utilizing monooxygenases 

a. Cyclosporin 

Cyclosporin A is a nonribosomal peptide drug with antifungal and 
5 immunosuppressive properties that is widely used as an immunosuppressant after transplant 
surgery. There currently exist at least 25 cyclosporin derivatives with various properties, 
and there is a great demand for new cyclosporin molecules. The creation of new derivatives, 
however, has been hampered by the difficult synthetic chemistry of these large natural 
product molecules (MW -1200). Therefore, a means of overcoming this limitation of 

1 0 traditional chemistry is of great value. 

Cytochrome P450 and other monooxygenase enzymes provide an alternative 
method of making modified cyclosporins. The P450 3 A subfamily contains members with 
various activities on cyclosporin A; for example, the 3 A5 enzyme can hydroxylate the amino 
acid at position 1, and 3A4 can hydroxylate amino acids 1 and 9 as well as demethylate 

15 position 4 (Aoyama et ai, JBC 264: 10388). Other activities exist among the large 3A 
subfamily, consisting of at least 30 members (see, 
http://dmelson.utmem.edu/homepage.html). 

Alignment of 14 of these 3 A genes shows homologies of 67-99%. Such 
diversity is ideal for shuffling, and provides a means of creating additional genetic diversity 

20 in the form of P450 libraries, with concomitant enzymatic diversity. Initial screening for 
new or improved activities can be done in bacteria, as the human 3 A4 enzyme and its 
accessory reductase are functional in E. coli (Parikh et ai. Nature Biotechnol 15:784). 
Activity of clones in libraries can be measured by high throughput mass spectroscopy 
detection of product molecules, for example. DNA firom clones with improved activity can 

25 be isolated and shuffled to recombine beneficial mutations, followed by screening for even 
better activity. 

b. Pravastatin 

Pravastatin is a steroid drug which lowers serum cholesterol by competitive 
30 inhibition of the cholesterol biosynthetic enzyme HMG-CoA reductase. Pravastatin 

(marketed as Pravachol™ by Bristol-Myers Squibb) is produced by a two-step fermentation 
(Serizawa et al In BIOTECHNOLOGY OF Antibiotics 2nd edition, W.R. Stfohl (ed.) (1997) 
New York: Mascel-Dekker, pp. 777-805): production of the precursor mevastatin by 
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Penicillium citrinum, and then hydroxylation of mevastatin to pravastatin by a cytochrome 
P450 enzyme in Streptomyces carbophilus. 

This invention provides a method to make the second step of this synthesis 
^- more efficient by increasing the ability of the S, carbophilus P450 to hydroxylate mevastatin. 

^ 5 The value of this improvement is in decreasing the cost of drug synthesis; much work has 

already gone into optimizing culture conditions (Serizawa et al, 1997), an indication that it 
i is an expensive process. 

The P450 that converts mevastatin to pravastatin has been characterized in 
some detail (Watanabe et al Gene 163:81-85. (1995)). The gene cytP-450sca.2 has been 
10 cloned and shows homology to other bacterial P450 genes, including 78% identity with the 
S, griseolus gene suaC, whose product is involved in herbicide detoxification (Omer et al, 
Nature 288-291 (1998)), and over 50% identity with several other P450 genes (see below). 
CytP-450sca-2 is fimctional when overexpressed in the laboratory strain S, lividans. 

\ 5 Table 1 . DNA homology between selected cytochrome P450 genes. 



CYP105A1 
suaC 


CYP105D1 
soyC 


CYP105B1 
subC 


CYP105A2 


Sca2 






58% 


51 


56 


78 


105A1 






51 


48 


57 


105D1 








56 


52 


105B1 










53 


105A2 












Sca2 



Improvement of the ability of CytP-450sca-2 to convert mevastatin to pravastatin 
can be accomplished by DNA shuffling. The known sequences provide an ideal platform for 
the family shuffling technique, wherein related, functional genes are shuffled together to 

20 create the initial library for screening/selection. Some of these genes can be obtained 

directly fi-om the microbe in which they were identified {e.g., CYP105A1 and CYP105B1 
fi-om S. griseolus strain ATCCl 1796, see Omer et al, 1990). Others genes such as CytP- 
450sca.2 can be assembled from synthetic oligonucleotides. The initial family shuffling can 
be done as described (Crameri et al, 1998). The initial screen for improved clones can be 

25 done in a surrogate host, such as E. coli or 5. lividans\ cells can be cultured in mevastatin 
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(or the related compound ML-236B.Na; see Watanabe et al. , 1 995, above) and the 
production of pravastatin detected by high throughput techniques, probably mass 
spectroscopy. The hydroxy group will easily differentiate the product from the substrate. 
The genes can be rescued from the best clones and shuffled together in subsequent cycles. 
The final test would be in an environment resembling actual fermentation conditions as 
much as possible. 

c. Herbicide Resistance and Bioremediation 

One set of P450 gene products with activity against herbicides consists of 
SuaC (CYP105A1) and SubC (CYP105B1) from Streptomyces griseolus (Omer et al, J. 
Bacterial. 172:3335) and related genes from other bacteria. These enzymes are active 
against sulfonylurea herbicides such as chlorimuron ethyl, chlorsulfuron, and sulfomethuron 
methyl (Harder et al, Mol. Gen. Genet. 227:238). Related bacterial P450 genes have been 
identified, with DNA sequence homologies of 48-78% {see, Table 2 below). Because these 
genes are of bacterial origin, they are best suited to bioremediation uses but may also be 
useful for creating herbicide-resistant plants. 

Another set of P450 genes can be isolated from plants with herbicide 
detoxification activities. Such activities are known to be due to plant cytochrome P450s 
(Lau and O'Keefe, Methods Enzymol. 272:235). It is possible to identify the genes, or at 
least portions of them, by using PCR primers targeted to conserved regions of P450s (Holton 
and Lester, Methods Enzymol. 272:275) which are responsible for this activity. 

DNA family shuffling (Crameri et al. , Nature 391:288) can be used to create 
hybrid variants from these genes, variants which can be screened for increased herbicide 
metabolism (detoxification). One way to screen for such activity in large numbers of 
samples is by measuring loss of fluorescence due to metabolism of the fluorescent 
sulfonylurea W5822 (DuPont) {see. Harder et al, Mol Gen. Genet. llTlZi). Other suitable 
screening systems employ mass spectroscopy, HPLC and other well-known analytical 
methods. Improved clones can be shuffled together in the next cycle of DNA shuffling for 
fiffther improvement. The best genes can then be transferred to plants and tested for 
conferral of herbicide resistance; fiirther optimization may be necessary to account for plant- 
specific factors. Likewise, for bioremediation uses, final improvement may be necessary in 
the uhimate host. Many additional herbicide applications of P450 shuffling are found in the 
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U.S. Patent Application entitled "DNA Shuffling to Produce Herbicide Selective Crops" 

Attorney Docket Number 01 8097-025600US and assigned U.S.S.N . 

Table 2 displays homology between selected cytochrome P-450 genes 
preferred for use in this embodiment of the invention. 

Table 2. DNA homology between selected cytochrome P450 genes. 
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In addition to these monooxygenase mediated reactions, the use of reactions that are 
10 mediated by polypeptides that do not have monooxygenase activity is also wdthin the scope 
of the present invention. In a preferred embodiment, these non-monooxygenase 
polypeptides will operate on a substrate that has been acted on by a monooxygenase. In 
another preferred embodiment, these polypeptides will operate on a compound prior to its 
being acted on by a monooxygenase. Moreover, it is within the scope of the present 
1 5 invention to improve one or more properties of the non-monooxygenase polypeptides by 
shuffling nucleic acids encoding these polypeptides. 



C Accessory P'^'ypeptides 

In conjunction with the oxidative pathways utilizing polypeptides having 
20 monooxygenase activity, as discussed above, the present invention provides accessory non- 
monooxygenase polypeptides. As used herein, "accessory polypeptides" refers to those 
polypeptide that do not carry out the initial monooxidation step in the methods of the 
invention. Exemplary accessory polypeptide include, ligases, transferases, dehydrogenases, 
and the like. Although both shuffled and non-shuffled polypeptides can be used, preferred 
^0 25 accessory polypeptides are those that have been shuffled. 
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The non-monooxygenase polypeptides can be used at any step of a pathway 
of the invention. In a preferred embodiment, they will be used to further transform the 
oxidation product. Although it will generally be preferred to utilize oxidized substrates that 
afe produced by a monooxygenase of the invention, those of skill will appreciate that these 
5 routes can be practiced with analogous substrates that are, for example chemically 
synthesized, commercially available, etc, 
I Moreover, the present invention provides methods using both the improved 

% accessory peptides and unimproved accessory peptides to further elaborate the 

monooxygenase-mediated reaction product. The method includes contacting the product of 
1 0 the moiiuoxygenase-mediated reaction with one or more of the accessory polypeptides. In a 
preferred embodiment, the product is contacted with an organism that expresses the 
accessory polypeptide(s). When the accessory polypeptides are improved polypeptides, they 
will generally be produced by the methods described herein. 

The improved monooxygenase and the accessory polypeptide(s) can be 
1 5 expressed by the same host cell, or they can be expressed by different host cells. In a 
I preferred embodiment, the accessory polypeptide is an improved polypeptide. 

By utilizing accessory polypeptides, the present invention makes possible the 
synthesis of a great variety of industrially valuable compounds via the methods disclosed 
herein. 

20 L Dehydrogenases 

In a preferred embodiment, an alcohol or diol is converted to an aldehyde or 
carboxylic acid by the action of a dehydrogenase. The substrate for the dehydrogenase is 
preferably the product of an improved oxygenase of the invention. 

Polynucleotides encoding many known dehydrogenases can be used as 
25 substrates for DN A shuffling. Exemplary dehydrogenases useful in practicing the present 
.^5 invention include, but ^re not limited to: 

[ECOALDB, ECAE000436, ECAE000239, D90780, D90781, ECOFUCO, 
^ ECOFUCO] dehydrogenase of Escherichia coli\ [AF029734 and AF029733] 

dehydrogenase of Xanthobacter autotrophicus', [AREXOYGEN] 
30 dehydrogenase of Agrobacterium radiobacter\ [AB003475] dehydrogenase of 

Deinococcus radiodurans; [AF034434, VIBTAGALDA] dehydrogenase of 
Vibrio cholerae; [D32049] dehydrogenase of Symchococcus sp.; [AEOOl 154] 
\^ dehydrogenase of Borrelia burgdorferi (BB0528); [ABY17825] 
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dehydrogenase of Agaricus bisporus; [ASNALDAA] dehydrogenase of 
Aspergillus niger, [EMEALDA, EMEALCA] dehydrogenase of Aspergillus 
nidulans; [AF019635, PPU15151] dehydrogenase of Pseudomonas putida 
TOL plasmid, xylW, xyl C; [AF03 1161] dehydrogenase of Pseudomonas sp. 
VLB 120, (stdD); [PFSTYABCD] dehydrogenase of P. fluorescens, styD; 
[PPU24215] dehydrogenase of P. putida, Flp-cymene alcohol and aldehyde 
dehydrogenases. 

2. Conversion ofhydroxyls and/or acids to esters 
In another preferred embodiment, there is provided a method for converting 
carboxylic acid and hydroxyl groups to adducts such as esters and ethers. Useful 
polypeptides include, for example, ligases and transferases {see. Fig. 4). For the purposes of 
the discussion below, these polypeptides are referred to as "adduct-forming" polypeptides. 

The adduct-forming polypeptides are usefiil for enhancing and controlling the 
production of biotransformation products. These polypeptides, which convert a diol, for 
example, to a monoacyl or monoglycosyl derivative can enhance control over the 
regioselectivity of subsequent reactions {e.g., chemical dehydration). For example, the 
regioselectivity of chemical dehydration in certain cases can be controlled by converting the 
compounds to their diacyl derivatives by means of chemical reaction, and then selectively 
removing one of the acyl groups using an polypeptide of the invention. Alternatively, one 
can control the regioselectivity of the dehydration by using an esterase or a trans-acylase 
polypeptide to convert the compounds to monoacyl derivatives, preferably in the presence of 
an excess of another carboxylic acid ester. In addition, the isolation of certain products is 
simplified by their conversion to more hydrophobic species. For example, the acylation of a 
diols to the corresponding carboxylic ester provides for a more efficient recovery of such 
diols. in the form cf an ester, by organic solvent extraction of the adduct. Preferred organic 
solvents are those that can be used in an immiscible biphasic organicaqueous 
biotransformation with whole cells, whether in a batch or in a continuous mode. 

An adduct-forming polypeptide can be expressed by the same host cell that 
expresses the dioxygenase, dehydrogenase, racemase, etc. , or it can be expressed by a 
different host cell. Moreover, an adduct-forming polypeptide can be a naturally occurring 
polypeptide, or it can be improved by the method of the invention. 
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When the adduct-forming polypeptide is an improved polypeptide, m 
presently preferred embodiments, the polypeptides demonstrates increased efficiency in the 
formation of the monoacyl- or monoglycosyl- derivatives of a desired compound (e.g., a 
glycol, carboxylic acid, etc.). Other improved adduct-forming polypeptides include 
transferases and ligases that can selectively modify only one of the hydroxyl groups of a 
diol, thus providing a means for controlling the regioselectivity of dehydration of such 
derivatives to either of two possible isomeric a-hydroxycarboxylic acid compounds. 

a. Acyltransferases 

One class of enzymes useful in practicing the present invention are the 
acyltransferases. These polypeptides can be evolved to enhance certain catalytic properties 
of the encoded polypeptides such as, specificity for a particular hydroxyl and/or acid, 
enantiomeric and/or diastereomcuc selectivity. 

More specifically, these polypeptides catalyze acyl transfer reactions as 
shown in Fig. 4. Acyltransferases are ubiquitous in nature, and many organisms (e.g., 
microbes, plants, mammals, etc.) can be used as sources of genes encoding these 
polypeptides. No matter their origin, the acyltransferase genes are preferably selected from 
those encoding functional polypeptides that catalyze active (CoA) ester transfer reactions in 
the biocatalytic processes described herein. Prefeired acyltransferase genes are selected 
ftora those encoding fiinctional polypeptides catalyzing reactions of small non-biopolymenc 
molecules. 

Examples of various acyltransferases useful in the present invention include 
polypeptides that catalyze the methylation of a-hydroxycarboxylic acids. A list of 
exemplary polynucleotides that can be recruited for this purpose are listed below by the 
corresponding GenBank identification: 

[AF043464] acetyl-CoA: benzylalcohol acetyltransferase of Clarkia breweri, 
and benzoyl-CoA benzyl alcohol acetyltransferase present in the same 
organism, (Dudareva et al. Plant Physiol. 116(2):599-604 (1998)); 
[DCANTHRAN, DCHCBTl. DCHCBTl A, DCHCBTIB, DCHCBT2, 
DCHCBT3] hydroxycinnamoyl^enzoyl-CoA:anthranilate N-acyltransferase 
of Dianthus caryophyllus; [E08840] homoserine o-acetyltransferase of 
Acremonium chrysogenum; [E12754] anthocyanin 5-aromatic acyltransferase, 
oiGentiana triflora; [HUMBCAT] branched chain acyltransferase (human, 
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J03208, J04723); [MG396;D02''orfl52(lacA); MJ1064(lacA) MJ1678, 
MTH1067]; galactoside 6-0 acetyl transferase EC 2.3.1.18, lac A of E.coli ; 
B0342(lacA); or of other organisms; [B3607(cysE), HI0606(cysE), 
HP1210(cysE), SLR1348(cysE)] serine O-acetyltransferase EC 2.3.1.30; 
[YGR177C, YOR377W] alcohol O-acetyltransferase, EC 2.3.1.84, of 
Saccharomyces cerevisiae; [e.g., Q00267,D90786,Z92774,I78931 AF030398, 
AF008204, AF042740] arylamine N-acetyltransferase, EC 2.3. 1 . 1 1 8; 
[YAR035(YAT1), YM8054.01(CAT2)] carnitine O-acetyltransferase, EC 
2.3. 1 .7, or mammalian origin of from yeast; [CHAT] choline 0- 
acetyltransferase, EC 2.3.1.6, of mammalian origin; acetyl 
CoA:deacetylvindoline 4-0-acetyltransferase (EC 2.3.1.107) St-Pierre et al, 
Plant J. 14(6): 703-713 (1998); and [ECOPLSC] l-acyl-sn-glycerol-3- 
phosphate acyltransferase (plsC) of Escherichia coli. 

b. Acvl CoA ligases 

In another embodiment an accessory polypeptide having acyl CoA ligase 

activity is provided. 

The specificity of acyl-CoA ligases towards a particular exogenous substrate 
or a group of substrates is preferably optimized by screening or selecting for the acylation of 
a substrate by shuffled and co-expressed acyl-CoA ligases and acyltransferases. Utilizing 
these polypeptides in tandem allows the combined effect of both polypeptides to be 
exploited. 

To illustrate the family or single gene shuffling approach to improving acyl- 
CoA ligases or acyltransferases, one more of the more members of the corresponding 
superfamilies of these polypeptides are selected, aligned with similar homologous 
sequences, and shuffled against these homologous sequences. 

An exemplary list of useful acyl-CoA ligase genes for inclusion into an 
organism of the invention is provided below: 

[AF029714, ECPAA, AJ000330, PSSTYCATA] phenylacetate-CoA ligase, 
EC 6.2.1.30; [Yl 1070, Yl 1071] phenylpropionate-CoA ligase; 
[B2260(menE), SLR0492(menE), SAU51132(menE)] O-succinylbenzoate- 
CoA ligase, EC 6.2.1.26; [RPU75363, RBLBADA, AA532705, AA664442, 
AA497001, AF042490, ARGFCBABC] (chloro)benzoate-CoA ligase, EC 
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6.2.1.25; [SBU23787, VPRNACOAL, P0TST4C1 1, RIC4CL2R, 0S4CL, 
AF041051, AF041052, GM4CL14, GM4CL16, LEP4CC0ALA, 
LEP4CC0ALB, PC4CL1 A, PC4CL1AA, PC4CL2A, PC4CL2AA. 
T0B4CCAL, T0BTCL2, T0BTCL6. ECO 11 OK, AF008183, AF008184, 
5 AF041049, AF041050. ArJ18675, NTU5084, NTU50846, PTU12013. 

PTU39404, PTU39405, ATF13C5, ORU61383, AF064095, AA660600, 
AA660679, STMPABA] 4-coumarate-CoA ligase EC 6.2.1.12; [RPU02033] 
4-hydroxybenzoate-CoA ligase; [PSPPLAS] 2-aminobenzoate-CoA ligase. 
In some embodiments of the invention, a carboxylic acid is fed exogenously 
10 to the organism that expresses the ligase c- ransferase. Preferably, the carboxylic acid is 
selected from those compounds that cannot be altered by the polypeptide used to produce the 
substrate acted upon by the adduct forming polypeptide. Such carboxylic acids include, for 
example, both substituted and non-substituted benzoic acid, phenylacetic acid, naphthoic, 
phenylpropionic acid, phenoxyacetic acid, cycloalkanoic acid, carboxylic acid, derived from 
1 5 terpenes, pivalic acid, substituted acrylic acids, and the like. 

To facilitate the utilization of exogenously supplied carboxylic acids, and for 
enhancing the variety of compounds suitable for use in this process, the invention also 
provides microorganisms in which one or more mutations are introduced. Preferred 
mutations are those that effectively block metabolic modifications of such acids beyond their 
20 conversion to a suitable active ester (e.g.. as a derivative of coenzyme A). Such mutations in 
the host organism can be introduced by classical mutagenesis methods, by site-directed 
mutagenesis, by whole genome shuffling, and other methods known to those of skill in the 
art. One can also introduce mutations that minimize host endogenous esterase activity. 

In a presently preferred embodiment, the acyl transferase-encoding nucleic 
acids used as substrates for creating recombinant libraries encode polypeptides that transfer 
an acetyl group from an endogenous pool of acetyl-CoA in the cells of the host. The 
endogenous pools of acetyl-CoA can also be enhanced by DNA shuffling of an acetyl-CoA 
ligase and by supplying an exogenous acetate in the medium. 

While using acetyl-CoA transferases or other acyltransferase or 
30 glycosyltransferase does not necessarily require expression of a corresponding acetyl-CoA or 
other ligase, in a presently preferred embodiment, the organisms produce a sufficient amount 
of an acyl-CoA ligase so as to activate the carboxylic acids to CoA thioesters, which m turn 
serve as substrates for acyl-CoA transferases that utilize the oxidation products as substrates. 
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The specificity of an acyl-CoA ligase towards a desired exogenous carboxylic acid can be 
optimized using the recombination and screening/selection methods of the invention. 
Preferably, the screening or selecting is performed using co-expressed acyl-CoA ligases and 
acyltransferases, thus permitting one to screen on the basis of the combined effect of both 
polypeptides in the pathway for provision of monoacylated derivatives of the oxidation 
products. 

Nucleic acids that encode acyl-CoA ligases and other acyltransferases useful 
as substrates for the recombination and selection/screening methods of the invention include, 
for example, one or more members of the superfamilies of these polypeptides. In a presently 
preferred embodiment, the nucleic acids are selected, aligned with similar homologous 
sequences, and shuffled against these homologous sequences. 

c. nivcosvltransferases 

Similarly, one or more glycosyltransferases can be expressed by the host cells 
of the invention. Alternatively, one or more glycosyltransferases can be selected from the 
glycosyltransferase superfamily, aligned with similar homologous sequences, and shuffled 
against these homologous sequences. Giycosyl transfer reactions are ubiquitous in nature, 
and one of skill in the art can isolate such genes from a variety of organisms, using one or 
more of several art-recognized methods. The following are illustrative examples of 
glycosyltransferase-encoding nucleic acids that can be used as substrates for creation of the 
recombinant libraries. The libraries are then screened to identify those polypeptides that 
exhibit an improvement in the glycosylation of compounds such as alcohols, diols and a- 
hydroxycarboxylic acids: 

[EC 2.4.1.123] inositol l-a-galactosyltransferase; [NTU32643, NTU32644] 
phenol p-glucosyltransferase, EC 2.4.1.35; flavone 7-0-beta- 
glucosyltransferase, EC 2.4.1.81; [AB002818, ZMMCCBZl. AF000372, 
AF028237, AF078079, D85186, ZMMC2BZ1, WUFGTJ; flavonol 3-0- 
glucosyltransferase, EC 2.4.1.91; o-dihydroxycoumarin 7-0- 
glucosyltransferase, EC 2.4.1.104; vitexin beta-glucosyltransferase, EC 
2.4.1.105; coniferyl-alcohol glucosyltransferase, EC 2.4.1.111; monoterpenol 
beta-glucosyltransferase, EC 2.4.1.127; arylamine glucosyltransferase, EC 
2.4.1.71; sn-glycerol-3-phosphate 1-galactosyltransferase, EC 2.4.1.96; 
[RNUDPGTR, AA912188, AA932333] glucuronosyltransferase, EC 2.4.1.17; 
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the human UGT and isoenzymes (-35 genes); sahcyl-alcohol 
glucosyltransferase, EC 2.4.1.172; 4-hydroxybenzoate 4-0-beta-D- 
glucosyltransferase, EC 2.4.1.194; zeatin O-beta-D-glucosyltransferase, EC 
2.4.1.203; [VFAUDPGFTA] D-fructose-2-glucosyltransferase; and 
[MBU419991 ecdysteroid UDP-glucosyltransferase (egt). 
In presently preferred embodiments, the glycosyltransferases are selected 
from those which transfer hexose residues from UDP-hexose derivatives. Preferred hexoses 
include, for example, D-glucose, D-galactose and D-N-acetylglucosamine. 



i 15 



d. MctViYltransferases 

In a still further preferred embodiment, the host cells of the present invention 
express a polypeptide capable of converting a carboxylic acid to a carboxylic acid methyl 
ester Presently preferred polypeptides include methyltransferases. 

For the purpose of this invention, genes encoding S-adenosylmethionme- 
dependent methyltransferases are preferred. In a preferred embodiment, these polypeptides 
are evolved to enhance selected properties of the encoded polypeptides such as, speaficity 
for a particular substrate and enantiomeric and/or diastereomeric selectivity and/or solvent 
resistance. 

More specifically, these polypeptides can be evolved to catalyze the O- 
20 methylation of carboxyl groups of a caroxylic acid substrate thus forming the correspondmg 
methyl esters. Methyltransferases are ubiquitous in nature, and many organisms (e.g., 
microbes, plants, mammals, etc.) can be used as sources of genes encoding these 
polypeptides. No matter their origin, the methyltransferase genes are preferably selected 
from those which encode functional polypeptides that catalyze the methylation of small non- 
25 biopolymeric molecules. Preferably, the methyltransferases are those wh.ch act on the 
carboxyl groups of organic acids. 

Examples of various methyltransferases that can be expressed by host cells of 
the invention and which are useft.1 for DN A shuffling-based directed evolution of 
polypeptides catalyzing the methylation of carboxylic acids are listed below by the 
30 corresponding GenBank identification: 

[SCCCAGC3] methyltransferase of Streptomyces clavuligerus 
methyltransferase CmcJ; [SEERYGENE] methyltransferase of S.erythraea 
methyltransferases; [SEU77454] methyltransferase of Saccharopolyspora 



BNSDOCID: <WO 0009682A1J_> 



wo 00/09682 



PCT/US99/18424 



erythraea; erythromycin 0-methyltransferase (eryG); [SGY08763] 
methyltransferase of S.griseus; [SKZ861 11] methyltransferase of S.lividans; 
[STMDNRDKP] methyltransferase of Streptomyces peucetius; carminomycin 
o-methyltransferase (dnrK); [MDAJ39670] methyltransferase of 
5 Streptomyces ambofaciens\ [SEY14332] methyltransferase of 

Saccharopolyspora erythraea\ [SPU 10405] methyltransferase of 
I Streptomyces purpurascens ATCC 25489; [STMDAUA] methyltransferase of 

■i 

Streptomyces sp.; aklanonic acid methyltransferase (dauC), and 
carminomycin 4-0-methyltransferase (dauK); [SC2A1 1 and SC3F7] 
I^Q methyltransferase of Streptomyces coelicolor; [SHGCPIR] methyltransferase 

of Shygroscopicus; [STMCARMETH] methyltransferase of Streptomyces 
peucetius carminomycin 4-0-methyltransferase; [STMODPOMT] 
methyltransferase of Streptomyces alboniger 0-demethylpuromycin-O- 
methyltransferase (dmpM); [STMTCREP]; methyltransferase of 
g 15 Streptomyces glaucescens; [SLLMRBG] methyltransferase of S lincolnensis 

ImrB methyltransferase; [SSU65940] 3 1 -0-demethy 1.FK506 
methyltransferase (fkbM) of Streptomyces sp.; [STMDAUABCE] aklanonic 
acid methyltransferase (dauC) of Streptomyces sp.; [STMMDMBC] O- 
methyltransferase (mdmC) of Streptomyces mycarofaciens; [STMTYLF] 
20 macrocyn-O-methyltransferase (ty IF) of S,fradiae\ [£08 1 76] Gene of 

mycinamicin Ill-O-methyltransferase; [AF040571] methyltransferase of 
Amycolatopsis mediterranei; [ECU56082] S-adenosylmethionine:2- 
demethylmenaquinone methyltransferase (menG) of Escherichia coli\ 
[RHANODABC] methyltransferase (nodS) of Azorhizobium caulinodans\ 
25 [YSCSTE14] isoprenylcysteine carboxyl methyltransferase (STE14) of 

4 Saccharomyces cerevisiae\ [YSCMTSW] famesyl cysteinecarboxyl- 

methyltransferase (STE14) of Saccharomyces cerevisiae; [YSCDHHBMET] 
3,4-dihydroxy-5-hexaprenylbenzoate methyltransferase (C0Q3) of 
Scerevisiae; [AF0041 12 and AF0041 13] phospholipid methyltransferases 
30 (choH), (cho2+) of Schizosaccharomyces pombe\ [ASNOMT, ASNOMTIA, 

ASNOMTIB, ASNOMTIC and AF036808-AF036830] O-methyltransferases 
of Aspergillus; [MSU20736] S-adenosyl-L-methionine; trans-caffeoyl-CoA3- 
O-methyltransferase of Medicago sativa; [ALFIOM] isoliqui :itigenin 2'-0- 
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methyltransferase of Medicago saliva; [MSU20736] S-adenosyl-L- 
methionine; trans-caffeoyl-CoAS-O-methyltransferase (CCOMT) of 
Medicago sativa; [MSAF000975] 7-0-methyltransferase (7-IOMT(6)) of 
Medicago sativa\ [MSAF000976] 7-G-methyltransferase (7-IOMT(9)) of 

5 Medicago sativa\ [MSU97125] of isoflavone-O-methytransferase Medicago 

sativa; [NTCCOAOMT] caffeoyl-CoA 0-methyltransferase of Nicotiniana 
tabacum\ [NTZ82982] caffeoyl-CoA 0-methyltransferase 5 of N.tabacum; 
[NTDIMET] o-diphenol-O-methyltransferase of N.tabacum; [PCCCOAMTR, 
PUMCCOAMT] trans-caffeoyl-CoA 3 -0-methyltransferase of Petroselinum 

jQ crispum; [PTOMTl] s caffeic acid/5-hydroxyferulic acid 0-methyltransferase 

(PTOMTl) of Populus tremuloide; [PBTAJ4894-PBTAJ4896] caffeoyl-CoA 
3-0-methyltransferases of Populus balsamifera subsp. trichocarpa; 
[ZEU1991 1] S-adenosyl-L-methionine: caffeic acid 3-O-methyltransferase of 
Zinnia elegans; [SLASADEN] S-adenosyl-L-methionine:trans-caffeoyl-CoA 

2 5 3 .Q-methy Itransferase of Stellar ia longipes; pA^CCO AOMT] caffeoy 1-Co A 

0-methyltransferase of V.vinifera; [D88742] O-methy Itransferase of 
Glycyrrhiza echinata\ [AF046122] caffeoyl-CoA 3-O-methyltransferase 
(CCOMT) of Eucalyptus globulus; [ATC0Q3] 

dihydroxypolyprenylbenzoate: methyltransferase of Arabidopsis thaliana 
20 [CSJSALMS90] S-adenosyl-L-methionine:scoulerine 9-0-methyltransferase 

of Coptis japonica; [HVU54767] caffeic acid 0-methyltransferase 
(HvCOMT) of Hordeum vulgare; [MCU63634] inositol methyltransferase 
(Imtl) of Mesembryanthemum crystallinum; [PSU69554] 6a- 
hydroxymaackiain methyltransferase (hmm6) of Pisum sativum; [CAU83789] 
25 O-diphenol-O-methyltransferase of Capsicum annuum; [U 1 6794] 3' flavonoid 

0-methyltransferase (fom::) of Chrysosplenium americanum; [CBU86760] 
SAM:(Iso)eugenol O-methyltransferase(IEMTl) of Clarkia breweri; salicylic 
acid carboxyl SAM-O-methyltransferase (Dudareva et al Plant Physiol 
116(2):599-604 (1998)); [HSHI0MT9] hydroxyindole-O-methyltransferase 
30 (HIOMT) of Homo sapiens; [HSC0MT2] gene catechol 0-methyltransferase 

of Homo sapiens; [HUMPNMTA] phenylethanolamineN-methyltransferase 
gene of Homo sapiens; [HUMCOMTA] catechol-O-methy Itransferase of 
Homo sapiens; [HUMCOMTC] catechol-0 -methyltransferase of Homo 
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sapiens; [HUMPNMT] phenylethanolamine N-methyltransferase of Homo 
sapiens] [AF064084] prenylcysteine carboxyl methyltransferase (PCCMT) of 
Homo sapiens] [HUMCMT] carboxyl methyltransferase of Homo sapiens] 
[HUMHNMA] histamine N-methyltransferase of Homo sapiens; 
[RATCATAA, RATCATAB] catechol-O-methyltransferase of R.norvegicus] 
[RATDHNPBMT] dihydroxypolyprenylbenzoate methyltransferase of Rattus 
norvegicus; [BOVPNMTB] of Bovine phenylethanolamine N- 
methyltransferase; [MPEMT7] phosphatidylethanolamine-N- 
methyltransferase of Mus musculus 2; [MMU86108] nicotinamide N- 
methyltransferase (NNMT; of Mus musculus] [MUSCMT] carboxyl 
methyltransferasease protein of Mouse; [GDHOMT] hydroxyindole-0- 
methyltransferase of G.domesticus] [DRU37434] L-isoaspartate (D-aspartate) 
0-methyltransferase {?CUl)of Danio rerio] [DMU37432] protein D- 
aspartyl, L-isoaspartylmethyltransferase of Drosophila melanogaster] and 
[HAU25845 and HAU25846] famesoic acid o-methyl-transferases of 
Homarus americanus. 



3. Epoxide hydrolases 

In a still further preferred embodiment, the present invention provides a 
nucleic acid encoding a polypeptide capable of converting a particular epoxide to the 

corresponding dioL 

Presently preferred polypeptides include epoxide hydrolases. Many epoxide 
hydrolases are known, and these enzymes have various substrate specificity and 
enantioselectivity. Examples of prokaryotic genes encoding epoxide hydrolases suitable for 
effecting epoxide hydrolysis relevant to this invention include, but are not limited to, 
[CAJ4332] Corynebacterium sp.; and [ARECHA] Agrobacterium radiobacter (echA). 

In a presently preferred embodiment, the polypeptide has one or more 
improved properties brought about by shuffling methods described herein. Thus, the nucleic 
acids encoding this gene, and any homologs of thereof, are subjected to DNA shuffling to 
evolve polypeptides having improved or optimal performance and specificity towards 
particular substrates such as a-hydroxycarboxylic acids. In a preferred embodiment, the 
polypeptide has a performance and/or specificity that is enhanced over the wild type. 
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Preferred polypeptides act on a-hydroxycarboxylic acid substrates, such as those displayed 
in Fig. 3. 

:H 4, Enantiomeric interconversion. 

:^ 5 In a still further preferred embodiment, the present invention provides a 

nucleic acid encoding a polypeptide capable of converting a particular enantiomer of a chiral 
1 compound such as an alcohol, diol or a-hydroxycarboxylic acid or a precursor or analogue 

M 

thereof to its antipode. 

Presently preferred polypeptides include racemases, such as the mandelate 
1 0 racemase of Pseudomonas putida (PSEMDL.^BC). These polypeptides can expressed by 
hosts of the invention in their natural form or, alternatively, they can be evolved to enhance 
certain catalytic properties of the encoded polypeptides such as, specificity for a particular 
substrate and enantiomeric and/or diastereomeric selectivity. 

The nucleic acids encoding the mandelate racemase of Pseudomonas putida, 
^- 1 5 which catalyzes the interconversion of mandelate R and S enantiomers, is a typical preferred 

example of genes selected for use in this invention. The nucleic acids encoding this gene, 
and any homologs of thereof, are subjected to DNA shuffling to evolve polypeptides having 
improved or optimal performance and specificity towards particular substrates such as a- 
hydroxycarboxylic acids. In a preferred embodiment, the polypeptide has a performance 
20 and/or specificity that is enhanced over the wild type. Preferred polypeptides act on a- 
hydroxycarboxylic acid substrates, such as those displayed in Fig. 3. 

5. a-Ketocarboxylic acid decarboxylase 

Several thiamine phosphate-dependent polypeptides of this class are known to 
25 occur in bacteria, fungi and yeast {see, Iding et al., Biochim. Biophys. Acta 1358:307-22 
(1998)). For the purpose of illustration, a gene encoding a well-known decarboxylase, 
preferably a benzoylformate decarboxylase {mdlC) of Pseudomonas putida [PSEMDLABC], 
% is shuffled to increase the specific activity towards a-ketocarboxylic acids, such as o- 

hydroxybenzalpyruvate. Alternatively, genes encoding pyruvate decarboxylases (EC 
30 4.1.1.1), indole-3-pyruvate decarboxylases (EC 4. 1 . 1 .74) or phenylpyruvate decarboxylases 
(EC 4.1.1.43) firom a variety of sources can be used. 
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6. Solvent resistance polypeptides 

The invention also provides organisms expressing one or more of the 
improved polypeptides of the invention and that are also resistant to solvents, organic 
^ 5 substrates and reaction products {e.g. , epoxides, glycols, a-hydroxyaldehydes, a- 

jiydroxycarboxylic acids and a-hydroxycarboxylic acid derivatives (e.g. , esters)) according 

I to the methods of the invention. 

The solvent resistance of organisms and polypeptide used in the biocatalytic 
conversion of organic compounds is important for enhancing the productivity of such 
10 processes. Increased solvent resistance of the organisms can enhance longevity, viability 
and catalytic activity of the microbial cells, and can simplify the administration of the 
feedstock compounds to the reactor and the recovery or separation of desired products by 
means of, for example, continuous or semi-continuous liquid-liquid extraction. 

In another aspect, the invention provides microbial cells that are useful in the 
I 1 5 synthetic methods described herein, which express proteins conferring resistance to solvents 

^ (in particular, organic solvents) upon the microbial cells. This allow^s the use of whole 

microbial cells in a organic-aqueous mixture {e.g. , a biphasic mixture). In presently 
preferred embodiments, the invention provides microbial strains including at least two of the 
polypeptide systems described herein. For example, a microorganism of the invention can 
20 contain both a dioxygenase gene and a transferase gene. In other embodiments, the 

microorganism can contain both an arene dioxygenase gene and a solvent resistance gene. 
The microbial cells thus provide a significant improvement in productivity of the synthesis 
processes, selectivity of product formation, operational simplicity, ease of product recovery 
and minimizing any by-product streams. 
25 Several microorganisms are known to possess high resistance to hydrophobic 

compounds such as benzene and lo wer alky Ibenzenes. Recently, genes encodir.^ a solvent 
efflux pump (srpABC) have been identified in Pseudomonas putida strains (Kieboom et al. 
i J. Biol. Chem. 273:85-91 (1998)). Similariy, various genes that encode polypeptides that 

confer organic solvent resistance can be found in bacterial strains such as Pseudomonas 
30 putida GM73 (Kim et al. J. Bacterial. 180: 3692-3696 (1998)), Pseudomonas putida DOT- 
TIE (Ramos et al. J. Bacteriol. 180: 3323-3329 (1998)), Pseudomonas idaho (Pinkart and 
White J. Bacteriol. 179: 4219-4226 (1997)). These and other genes, such as those that 
' ■ . encode many proton-dependent multidrug efflux systems, e.g. , MexA-MexB-OprM, MexC- 
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MexD-OprJ, and MexE-MexF-OprN oiPseudomonas aeruginosa (Li et al. J. Bacterial. 
180: 2987-2991 (1998)), or the tolC, acrAB. marA. soxS, and robA loci of Escherichia coli 
(Aono et al. J. Bacterial. 180:938-944 (1998); White et al., J. Bacterial. 179:6122-6126 
(1997)), and in many other microorganisms, can be used to confer solvent resistance upon a 
host microbial strain used in the oxidative biocatalytic conversion of olefins by means of 
action of dioxygenases or dioxygenases. 

In presently preferred embodiments, the ability of a polypeptide to confer 
solvent resistance is enhanced by subjecting nucleic acids encoding solvent resistance 
polypeptides, or the genomes of the microorganisms themselves, to the recombination and 
selection/screening methods described herein. The nucleic acids listed above, as well a- 
similar genes, provide a source of substrates for incorporation into organisms of the 
invention and/or use in DNA shuffling and other methods of constructing libraries of 
recombinant polynucleotides. The libraries can then be screened to identify those nucleic 
acids that encode polypeptides conferring improved solvent tolerance on a host. For 
example, one can select for improved tolerance to compounds such as olefins, AHAs, 
aldehydes, esters and hydrophobic solvents, including alkanes, cycloalkanes. alcohols and 
halocarbon derivatives, for example, which are used for performing biotransformation {e.g., 
two-phase oxidation) of olefins to glycols, AHAs and to their corresponding acyl- and 
glycosyl- derivatives, etc. Similarly, DNA shuffling of nucleic acids that encode these 
polypeptides can be used to confer and to improve resistance of the microbial cell to high 
concentrations of biotransformation substrates, intermediates and endproducts, thus 
improving biocatalyst performance and productivity. 

In addition to each of the methods set forth above, the present invention 
provides polypeptides produced according to these disclosed methods. Moreover, the 
invention provides organisms that express the polypeptides produced by the method of the 
invention. The organisms of the invention can express one or more of the improved 
polypeptides. Also provided by the present invention are methods of synthesizing a desired 
compound. This method includes contacting an appropriate substrate with a polypeptide of 
the invention. In a preferred embodiment, the substrate is contacted with an organism of the 
invention that expresses a polypeptide of the invention. 
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D. Methods of Using Improved Polypeptides to Prepare Organic Compound s 

In addition to the methods discussed above, the present invention provides a 
range of methods for preparing useful organic compounds by the oxidation and further 

5 elaboration of appropriate precursors. Among the methods provided by the present 
invention are, for example, the oxidation of alkylarene compounds to the corresponding 
unsaturated diols and the subsequent dehydration of these diols hydroxy alkylarenes. 
Additionally, there is provided an analogous method for preparing hydroxylated aromatic 
carboxylic acids. Moreover, the invention provides methods for preparing cyclic exocyclic 

10 and/or acyclic diols from molecules having alkene bonds. The exocyclic and acyclic diols 
can be readily converted to a-hydroxycarboxylic acids. 

The reaction types and sequences set forth below are illustrative of the scope 
of the invention. The monooxygenases of the invention are capable of oxidizing any organic 
substrate comprising an oxidizable moiety. Additional reaction sequences utilizing the 

1 5 polypeptides of the invention will be apparent to those of skill in the art. 

/. Preparation of epoxides 

In a preferred embodiment, there is provided a method for converting an 
olefin into an epoxide. The polypeptide of the invention is designed to be functional with 
20 substantially any olefmic substrate, however, in a preferred embodiment, the polypeptide 
acts on at least one alkene group of a substrate that includes: 

to produce an epoxide product having the structure: 

25 wherein. R' and are independently selected from H, alkyl, substituted alkyl, aryl, 

substituted aryl, heteroaryl, substituted heteroaryl, heterocyclyl, substituted heterocyclyl, 
— NR'R\R')m, —OR', — CN, C(R*)NR'R* and C(R*)0R' groups. K\ and R' are 
members independenUy selected from the group consisting of H, alkyl, substituted alkyl, 
aryl, substituted aryl, heteroaryl, substituted heteroaryl, heterocyclyl and substituted 
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heterocyclyl groups. is selected from =0 and =S. m is 0 or 1, such that when m is 1, an 

ammonium salt is provided. 

In a still further preferred embodiment, the olefmic substrate is selected from 
2-vinylpYridine, 4-vinylpyridine, 3-butenenitrile, vinylacetamide, N,N-dialkyI 
5 vinylacetamide, diallylamine, triallylamine, diallyldimethylammonium salts, styrene and 
phenyl-substituted styrene. 

2 Preparation of vicinal diols 

The formation of vicinal diols by oxidizing a 7i-bond using a monooxygenase 
10 of the invention and hydrolyzing the resulting epoxide provides ready access to a wide array 
of compounds that are useful as both final products and as intermediates in multi-step 
reaction pathways. The monooxygenases of the invention are capable of converting to 
expoxides and, thus, to vicinal diols an array of structurally distinct compounds comprising 
one or more Ti-bonds. 

1 5 Although the method can be practiced with essentially any 7t-bond, in 

essentially any compound, in a preferred embodiment, the method includes preparing a 
vicinal diol group by contacting a substrate comprising a carbon-carbon double bond with an 
improved monooxygenase polypeptide, or an organism expressing an improved 
monooxygenase polypeptide to fom an epoxide. The epoxides are cleaved by chemical or 

20 enzymatic action. 

In another preferred embodiment, the substrate comprising the carbon-carbon 

n-bond is selected from styrene, substituted styrene, divinylbenzene, substituted 

divinylbenzene, isoprene, butadiene, diallyl ether, allyl phenyl ether, substituted allyl phenyl 

ether, allyl alkyl ether, allyl aralkyl ether, vinylcyclohexene, vinylnorbomene, and acrolein, 

25 In yet another preferred embodiment, the vicinal diol produced by the action 

of the improved monooxygenase polypeptide has the structure: 




wherein and R' are independently selected from alkyl, substituted alkyl, aryl, substituted 

2 3 

aryl, heteroaryl, substituted heteroaryl, heterocyclyl, substituted heterocyclyl, — NR R , 
■4 30 —OR^, — CN, C'R'')NR^R^ and C(R'')OR^ groups, or R' and R^ are joined to form a ring 
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system selected from saturated hydrocarbyl rings, unsaturated hydrocarbyl rings, saturated 
heterocyclyl rings and unsaturated heterocyclyl rings; and are members independently 
selected from H, alkyl, substituted alkyl, ar>'l, substituted aryl, heteroaryl, substituted 
heteroaryl, heterocyclyl and substituted heterocyclyl groups; R"* is selected from =0 and =S; 
5 R^ and R^ are independently selected from H and alkyl; and n is a number from 0 to 10, 
inclusive. 

In certain preferred vicinal diols R* is selected from phenyl, substituted 
^ phenyl, pyridyl, substituted pyridyl — NR^R\ —0R\ —CN, C(R')NR^R' and C(R')0R^ 

groups, R^ and R^ are members independently selected from H, alkyl, substituted alkyl, aryl, 
10 substituted aryl, heteroaryl, substituted hetei JiTjl heterocyclyl and substituted heterocyclyl 
groups; and R"* is selected from =0 and =S. 

In another preferred embodiment, the diol includes a six-member ring having 
at least one endocyclic double bond and at least one substituent selected from methyl, 
carboxyl and combinations thereof. 

15 

3. Dehydrogenation of ROH groups 

In an other preferred embodiment, the invention provides a class of improved 
P-450 polypeptides that dehydrogenate hydroxyl-containing substrates. Although 
substantially any hydroxyl-containing substrate can be dehydrogenated using the 
20 polypeptides of the invention, in a preferred embodiment, the substrate is: 

{CH(R'')(CH2)sR''}t 
{CH(R")(CH2)r,R'% 

wherein R", R'\ and R'" are independently selected from H and OH and at least one of 
R", R'^, R." ar.'l R"* is OH; n and s are independently selected from the numbers 0 to 16; 
and p and t are independently selected from 0 to 6, wherein at least one of p and t must be at 
25 least one. The enzyme of the invention, preferably, converts at least one hydroxyalkyl group 
to a member selected from: 

— COOH,and— C(0)H. 
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In another preferred embodiment, the substrate is selected from among 
toluene and xylene and the polypeptide converts said at least one methyl group to a 
carboxylic acid or a carbonyl. 

5 4, Preparation of a-hydroxycarboxylic acids 

In another preferred embodiment, there is provided a method for convertmg 
1 an olefin to an a-hydroxyaldehyde or an a-hydroxycarboxylic acid. In a preferred 

f embodiment, the olefin is converted to an a-hydroxycarboxylic acid. The method includes: 

(a) contacting the olefin with an improved monooxygenase polypeptide of the invention to 
1 0 form epoxide; (b) hydrolyzing the epoxide to form a vicinal diol; and (c) contacting the 
: vicinal diol with a dehydrogenase polypeptide to form the a-hydroxycarboxylic acid. 

As in other methods involving the hydrolysis of the expoxide, the epoxide can 
be hydrolyzed using chemical or enzymatic means. The hydrolysis is preferably mediated 
by an improved epoxide hydrolase prepared using the methods of the invention. The 
1 5 dehydrogenase polypeptides useful in this embodiment can be naturally occurring 
J polypeptides or, alternatively, they can be polypeptides improved using the methods of the 

invention. When more than one polypeptide is used to effect a particular transformation they 
can be expressed in the same host organism or in different host organisms. 

a-Hydroxycarboxylic acids (AHAs) are an important group of industrial 
20 chemicals. One of the simplest representatives of this class of compounds is lactic acid. 
Lactic acid is used for many purposes, including the synthesis of polyester polymers (e.g., 
polylactic acid). In addition to the lactic acid homopolymer, lactic acid can be 
copolymerized with other a-hydroxycarboxylic acids, such as mandelic acid, to form co- 
polymers with lactic acid. Enantiomerically pure hydroxycarboxylic acids are also used as 
25 resolving reagents for separating mixtures of chiral molecules, a-Hydroxycarboxylic acids 
^ are generated chemically by a variety of general methods that are less than ideal. For 

example, a commonly used method, hydrolysis of a cyanohydrin is problematic. The 
i cyanohydrins are produced by the addition of HCN to an aldehyde. Aldehydes are relatively 

expensive starting materials and the hydrolysis of the cyanohydrins to the corresponding a- 
30 hydroxycarboxylic acids does not proceed in an enantioselective manner. This necessitates 
the disposal or recycling of a substantial portion of the costly aldehydes. 

Chiral lactic acid has been manufactured by means of a microbial 

fermentative process using a carbohydrate feedstock. At present, this fermentative 
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methodology does not provide a means for making AHAs other than lactic acid. A great 
number of usefUl AHAs have a structure wherein the lactic acid methyl group is replaced 
with another substituent such as, for example, aromatic, alicyclic or alkenic moieties useful 
for subsequent chemical modifications of either the AHAs themselves, or of polymers or 

5 copolymers incorporating these AHAs. 

A promising route to the highly selective manufacture of chiral AHAs is 
based on the oxidation of olefins by means of a monooxy genase polypeptide of the 
invention. These polypeptides can be isolated and used in vitro or, alternatively, they can be 
used in vivo by using whole microbial cells displaying the appropriate polypeptide activity. 

10 Moreover, dioxgenase polypeptides also have useful activity. The preparation of a-hydroxy 

carboxylic acids utilizing dioxygenases is disclosed in U.S.S.N. , bearing 

Attorney Docket No. 018097-031100, entitled "Shuffling of Dioxygenase Genes for 
Production of Industrial Chemicals", filed on an even date herewith and incorporated by 

reference in its entirety. 
J 5 The present invention also provides improved polypeptides that exhibit an 

enhanced ability to convert a range of substrates to a-hydroxycarboxylic acids, a- 
hydroxycarboxylic acid precursors and analogues by processes employing oxidative 
biocatalysis. Methods are provided for generating polynucleotides that encode enzymes that 
catalyze these reactions and that have improved properties. Presently preferred substrates 

20 include olefins. 

Biocatalytic methods that employ the recombinant polypeptides provided by 
the present invention have several significant advantages over previously available methods 
for the synthesis of a-hydroxy acids, their precursors and analogues. For example, the 
invention provides polypeptides that can increase the amount of product produced in a 
25 reaction, as well as increase the enantiomeric excess and/or regiospecific formation of the 
product. Among the enhanced properties that are obtained using the methods include 
enhanced forward rate kinetics, altered substrate specificity and affinity, enhanced 
regioselectivity and enantioselectivity, and decreased susceptibility to inhibitors and 
inactivation by substrates, intermediates and products. 
3Q As is generally true for the other aspects and embodiments of the present 

invention, the recombinant polypeptides of the invention are preferably expressed by an 
organism, such as microbial cells, that carry out the biocatalysis. Accordingly, the invention 
also provides organisms that are adapted for efficient biocatalytic manufacturing of a- 
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hydroxycarboxylic acids, their analogues and their precursors. The microorganisms 
preferably express one or more recombinant polypeptides that are optimized for the 
biocatalysis pathway of interest. The biocatalytic polypeptides that are expressed by the 
microbial cells can be wild type or they can be recombinant polypeptides that exhibit 
5 improved properties encoded by the recombinant nucleic acids obtained using the methods 
of the invention. In a preferred embodiment, the organism expresses at least two enzymes 
selected from an improved monooxygenase, an epoxide hydrolase and a dehydrogenase. 
Either or both of the epoxide hydrolase and the dehydrogenase can be an improved 
polypeptide. 

J Q In yet another embodiment, a nucleic acid encoding a polypeptide that 

converts a vicinal glycol to an a-hydroxyaldehyde and/or an a-hydroxycarboxylic acid is 
provided. For the purpose of this invention, the genes encoding dehydrogenase polypeptides 
for conversion of the glycols to ^ hydroxyaldehydes and/or to a-hydroxycarboxylic acids, 
can be selected from many known dehydrogenases. 

J 5 In another preferred embodiment, the method of invention is used to convert 

olefmic and vicinal diol precursors to a-hydroxycarboxylic acids having the structure: 

OH 



25 




HOOC ^ n 



wherein, 

r' is selected from aryl, substituted aryl, heteroaryi. substituted heteroaryl, heterocyclyl, 
substituted heterocyclyl, -NR^R\ -OR^ -<:N, C(R*)NRV and C(R*)OR^ groups; R^ 
and R' are members independently selected from H, alkyl, substituted alkyl, aryl, substituted 
aryl, heteroaryl, substituted heteroaryl, heterocyclyl and substituted heterocyclyl groups; R' 
is selected from =0 and =S, and n is a number between 0 and 10, inclusive. 

In a still further preferred embodiment, R' is selected from phenyl, substituted 
phenyl, pyridyl, substituted pyridyl -NR^R\ -OR^ -^^N, C(R>R^R^ and C(R^)OR^ 
groups; R' and R' are members independently selected from H, alkyl, substituted alkyl, aryl, 
substituted aryl, heteroaryl, substituted heteroaryl, heterocyclyl and substituted heterocyclyl 

groups; and R" is selected from =0 and =S. 

In yet another prefened embodiment, the invention provides a method for 
30 altering or controlling the regiospecificity of the dehydrogenation reaction. This method 
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"blocks" one of the vicinal diol hydroxyl groups by forming and ester, for example. The 
method includes contacting the vicinal diol with a microorganism comprising an improved 
polypeptide having an activity selected from ligase, transferase and combinations thereof, 
thereby forming a a-hydroxycarboxylic acid adduct. As with the other polypeptides 
5 discussed above, this polypeptide can be expressed by the same host cell that expresses other 
polypeptides of the reaction cascade. Moreover, this polypeptide can be a naturally 
occurring polypeptide, or it can be improved using the method of the invention. 

a. a-Hvdroxvcarboxylic acid adducts 

10 AHAs are bifunctional molecules with two chemically and enzymatically 

distinguishable functional groups, carboxyl and hydroxyl. In the biocatalytic modifications 
of AHAs described in this invention, either of these groups can be derivatized by bond 
formation. While these reactions do not change the oxidation state of the AHA molecule, 
recruitment of the enzymes effecting modification of AHAs provides the opportunity to 

1 5 generate biotransformation endproducts with substantially different physical and chemical 
properties than that of a free AHA. Generally desirable properties include an increase of 
hydrophobicity, a decrease of aqueous solubility and, for an ester formed through a 
carboxylic group of an AHA, a decrease in acidity of the process end-products. 

In a preferred embodiment, the adduct-forming polypeptide produces an a- 

20 hydroxycarboxylic acid adduct selected from esters and ethers. The method includes 

contacting an a-hydroxycarboxylic acid with a polypeptide having an activity selected from 
ligase, transferase and combinations thereof, thereby forming a a-hydroxycarboxylic acid 
adduct. The adduct forming polypeptides useftil in this embodiment can be naturally 
occurring polypeptides or, alternatively, they can be polypeptides improved using the 

25 methods of the invention, as discussed generally, above. 

Exemplary adduct forming reactions are provided in Fig, 4. This Figure 
shows the use of a methyltransferase to convert carboxylic acid (X) to the corresponding 
methyl ester (XI), acyltransferase I to convert the X to ester XIII, and acyl-CoA ligase to 
convert X to intermediate XIV. This intermediate can then be transformed into a simple 

30 alkyl ester PCIX) or to structures having greater complexity of structure in the alcohol- 
derived component (e.g., XV). Species such as XV can be frirther elaborated using other 
polypeptides including, for example, acyltransferase III to produce compound XVII, 
thioesterase II to produce compound XVIII and thioesterase I to produce compound XVI. 
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In a further preferred embodiment, the a-hydroxycarboxylic acid adduct has 

the structure: 



OR^ 




wherein, is selected from aryi, substituted aryl, heteroaryl, substituted heteroaryi, 
5 heterocyclyl, substituted heterocyclyl, — NR'R'(R')m, -OR', ^N, C(R')NR'r' and 
C(R^)OR^ groups, R^ R^ and R'* are members independently selected from the group 
consisting of H, alkyl, substituted alkyl, aryl, substituted aryl, heteroaryl, substituted 
heteroaryl, heterocyclyl and substituted heterocyclyl groups; R^ is selected from =0 and =S; 
R^ is selected from H, alkyl and substituted alkyl groups; R^ is C(0)R^ wherein R^ is 
10 selected from H alkyl and substituted alkyl groups and R^ and R^ are not both H; m is 0 or 1 , 
such that when m is 1, an ammonium salt is provided; and n is a number between 0 and 10, 
inclusive. 

In yet another preferred embodiment, R^ is selected from phenyl, substituted 
phenyl, pyridyl, substituted pyridyl — NR'r\ -OR^ -^N, C(R^)NR'r' and C(R')0R' 
1 5 groups; R^ and R^ are members independently selected from the group consisting of H, C r 

Ce alkyl and allyl; and R^ is =0. 

In yet another preferred embodiment of this invention, the described reactions 
and pathways are utilized for biocatalytic whole-cell conversion of styrene to mandelic acid 
and its ester derivatives. The pathway for styrene conversion, all of its intermediates and 
20 reactions are shov^ in Fig. 2. 

The esterified adducts provide an increase in the overall efficiency of the 
biotransformation process as they simplify end-product recovery. The esters are easily 
isolated by organic solvent extraction and partitioning. Moreover, the adducts obviate the 
need for pH adjustment in the aqueous fermentation media to prevent the accumulation of 
25 the high levels of acidic biotransformation products. 

There are several biochemically distinct means by which AHAs can be 
biocatalytically esterfied in a substantially aqueous environment. In one preferred 
embodiment of this invention, expression of genes encoding an S-adenosylmethionine 
(SAM)-dependent 0-methyltransferase is used to effect conversion of AHAs to their methyl 
30 esters (e.g. , Fig. 4, conversion of compound X to compound XI). SAM-dependent 
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methyltransferases of differing substrate specificity are common in nature, and suitable 
enzymes and corresponding genes can be found and used directly for the purpose of this 
invention. Alternatively, these species can be further evolved and optimized for specific 
activity with the AHAs using one or more nucleic acid shuffling methods described herein. 
5 The invention also provides means for HTP screening for the presence, and quantitative 
determination, of the AHA-specific 0-methyltransferase catalytic activities in 
microorganisms, cells, tissues or extracts of tissues of higher eukaryotic organisms. These 
methods can be used either to identify sources of corresponding genes or to evolve the 
desired specificity of known methyltransferases towards the AHAs by means of DNA 

1 0 shufLiag described heiein. 

In another embodiment acyltransferase enzymes which specifically esterify 
the sec-hydroxyl of AHAs by means of active carboxyl transfer from either acyl-coenzyme 
A or acylated acyi carrier protein (ACP) are incorporated into the reaction pathway. This 
pathway is depicted in Fig 4, as shown by the coupling of compounds X and XII to yield 

15 compound XIII. A preferred embodiment of this pathway, involves recruiting and 

expressing gene(s) encoding acyl-CoA-dependent acyltransferases, including those which 
utilize as substrates acetyl-CoA and CoA derivatives of fatty acids, as well as lactoyl-CoA, 
CoA-thioesters with other AHAs, and CoA derivatives of aromatic, arylalkanoic, branched 
chain alkanoic carboxylic acids, and alpha-aminoacids. Where carboxylic acids (either in 

20 from of free acid, salt or ester), intended for esterification of AHAs, are supplied 
exogenously, or are co-produced by another co-functioning biotransformation or 
fermentative pathway in the same host organism, or a different host organism, the invention 
provides a means for facilitating ester formation by recruiting and co-expressing those acyl- 
CoA ligases or ACPs which effect in-vivo activation of these acids forming suitable 

25 substrates for the acyl transferase enzymes that act on the AHAs. 

The invention also provides for another type of biochemical transformation of 
AHAs to AHA carboxylic esters wherein free AHAs are first converted to their active ester 
form by means of the enzymatic formation of a derivative with CoA or ACP (Fig. 4, 
compound XIV). Several alternative acyltransferase enzymes (and genes encoding them) 

30 can be recruited for effecting subsequent transformations of compound XIV to esters of 
different compositions. These preferably include AHA-CoA transferases acting (a) on 
alcohols (XX) to produce esters (IX), or (b) on molecule of compound XIV or compound 
XV to produce acyclic homo- and hetero- oligomers (n=2-5) of AHAs. By recruiting an 
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additional thioesterase enzymes, the activated forms of these ohgomeric esters can be 
converted to free carboxylic oligomers (e.g., XVIII) or to the cyclic substituted glycolides 
(XVI). 

In another preferred embodiment, the formation of an a-hydroxycarboxylic 
5 acid ester is catalyzed by an acyl CoA-ligase that is evolved by nucleic acid shuffling. In a 
preferred embodiment, shuffling of nucleic acids encoding acyl-CoA ligase activities results 
in an increase in the synthesis of esters. In another preferred embodiment, the esters are 
selected from structures XIII-XVIII (Fig. 4). The synthesis of these and other esters will 
generally rely on the provision of a corresponding a-hydroxycarboxylic acid precursor. In a 
10 preferred embodiment, the a-hydroxycarboxylic acid precursor is present in an amount 
sufficient to establish intracellular pools of CoA-activated carboxylic derivatives of a- 
hydroxycarboxylic acids. 

In still another preferred embodiment, the transferase polypeptide is selected 
from glycosyltransferase and methyltransferase, more preferably methyltransferase and more 
15 preferably still a S-adenosylmethionine dependent 0-methyltransferase. 

5, Enzymes effecting chiral switch at the level ofAHAs, 

Another object of this invention is the effective control of the enantiomeric 
composition of the compounds prepared by the methods of the invention. For clarity of 
20 illustration, the discussion below focuses on AHA esters made by the biotransformation 
process from alkenes. This focus is intended to be illustrative and not limiting of the scope 
of this embodiment of the invention. 

Means of enantiomeric control, when integrated as part of the multistep 
biocatalytic pathway, constitutes an important advantage as it allows selective production of 
25 either enantiomer of the AHA. The enantiomerically pure AHAs can be used as resolving 
reagents, chiral synthons, or monomers for polyesters or co-polyesters witv lactic acid. 

In a preferred embodiment, the AHA is mandelic acid, or an analogue thereof, 
and the chiral switch is effected by recruiting mandelate a racemase gene. 

Mandelate racemase catalyzes the interconversion of the R and S enantiomers 
30 of mandelic acid and its derivatives. An exemplary mandelate racemase is that of 

Pseudomonas putida (the sequence of the gene can be found in the GenBank database under 
the locus [PSEMDLABC]). Preferred mandelate racemases are those of the P.putida strain 
ATCC 12633, however, mandelate racemases from any other organism can be used. 
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Although, in a preferred embodiment, the chiral switch is made at the level of 
the AHA, this switch can be made with any of the precursors or adducts of the AHA as well. 
Thus, in yet another preferred embodiment, the AHA is modified by at least one of the ester- 
forming enzymes discussed herein. Preferred ester forming enzymes are those which 
5 specifically, or preferentially, act on one enantiomer of the AHA, thus allowing 

enantiospecific resolution of the racemate in-vivo. The activity of the above racemases 
v| provides an enantiomeric equilibrium at the expense of the non-esterified enantiomer. The 

'^'^ combined action of the racemase and the AHA esterifying enzymes provides a chiral switch 

which allows preparation of one desired enantiomer, whether R or S, from AHAs of any 
10 enantiomeric composition, 

6. Hydroxylation of organic substrates 

The monooxygen-jc polypeptides of the invention are capable of 
hydroxylating substantially any substrate comprising a terminal methyl, internal methylene 
15 or 7i-bond group. These substrates include, for example, alkyl, substituted alkyl, aryl, 

substituted aryl, heteroaryl, substituted heteroaryl and the like. Other appropriate substrates 
will be apparent to those of skill in the art. 

In a preferred embodiment, the substrate has the structure: 

(R)n 

20 wherein, each of the n R groups is a member selected from the group consisting of H, alkyl 
groups and substituted alkyl groups; m is a number from 0 to 10, inclusive; and n is a 
number from 0 to 5, inclusive. 

In another preferred embodiment, the substrate includes benzene substituted 
with a member selected from the group of straight-chain alkyl groups branched-chain alkyi 
25 groups and combinations thereof. The substituent is more preferably, a member selected 
from CrC6 straight-chain, Ci-Ce branched-chain alkyl and combinations thereof, and even 
more preferably, ethyl, n-propyl, i-propyl, t-butyl and combinations thereof. 

In another preferred embodiment, the substrate has the structure: 
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wherein, n is a number between 0 and 9, inclusive. 

In yet another preferred embodiment, the substrate has the structure: 

(CH3)n 




wherein, n is an integer from I to 6. 

Presently preferred products of these oxidation reactions include benzyl 
alcohol, substituted benzyl alcohol, 2-phenylethanol, substituted 2-phenylethanoI, 3- 
phenylpropanol, substituted 3-phenylpropanol and their derivatives. 

In a still further prefened embodiment, the substrate includes a member 
selected from 3,4-dihydrocoumarin and 3,4-dihydrocoumarin residues and the poly peptide 
converts a methylene group of the substrate to — CH(OH) — . 

In yet another preferred embodiment, the substrate is 3,4-dihydrocoumarin 
and the polypeptide converts the substrate to 4-hydroxy-4-dihydrocoumarin. 

7. Preparation ofhydroxylated aromatic carboxylic acids 

Hydroxylated aromatic carboxylic acids have many diverse uses, including as 
antimicrobial additives, UV protectants (e.g. esters of p-hydroxybenzoic acid, parabens), 
pharmaceutical compositions (e.g., esters of salicylic acid, coumarins and 3,4- 

dihydroxy coumarin) . 

Thus, in another preferred embodiment, the present invention provides a 
method for preparing hydroxylated aromatic carboxylic acids. The method includes 
contacting a substrate comprising an aryl carboxylic acid with a dioxygenase polypeptide of 
the invent= on. The polypeptide is preferably expressed by an organism of the invention. 

a. Carboxvlic acid substrates 

The carboxylic acids used as substrates in the present invention can be 
obtained from commercial sources, or they can be prepared by methods known in the art. In 
a preferred embodiment, the carboxylic acids are prepared by contacting a substrate 
comprising an aryl alkyl group with an oxygenase polypeptide to produce the corresponding 
aryl alkyl alcohol. The alcohol is subsequently acted upon by a dehydrogenase polypeptide 
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to produce the desired carboxylic acid. Alternatively, the alcohol can be converted to COOH 
by chemical means. 

For clarity of illustration, the discussion herein focuses on the oxidation of 
arylmethyl groups to carboxylic acids. This focus is intended to be illustrative and not 
5 limiting. 

1 (i), Alkyl group monooxygenation 

The first step in the biotransformation processes for conversion of alkylaryl 
compounds, such as toluene and isomeric xylenes includes the selective oxidation of at least 
10 one methyl group present in the arom?^' j substrate to the corresponding carboxylic acid 
{e.gy benzoic, toluic acids). In an exemplary embodiment, the substrate is ap- or a m- 
xylenes and preferably, only one of the methyl groups is oxidized. 

Following the oxygenation step, the resulting alcohol is dehydrogenated, 
generally by the action of a dehydrogenase polypeptide to produce the desired carboxylic 
15 acid. 

The invention provides for polypeptides that selectively oxidize only one 
alkyl group of an arene bearing two or more alkyl substituents. In an exemplary 
embodiment, xylene is converted to a monocarbocylic acid. Alternatively, the invention 
provides polypeptides that are capable of oxidizing more than one alkyl substituent of a 
20 species substituted with two or more alkyl groups. This is in contrast to certain polypeptides 
of the invention are capable of oxidizing both of the methyl substituents of a xylene to 
produce the corresponding benzenedimethanol (4a). 

In a preferred embodiment, the monoxygenation/dehydrogenation pathway 
produces a carboxylic acid having the structure: 




25 



COOH 

wherein each of the n R groups is independently selected from H, alkyl and substituted alkyl 
groups; and n is a number from 1 to 5, inclusive, more preferably R is methyl, and more 
preferably still, n is a number firom 1 to 3, inclusive. 

In a still further preferred embodiment, the carboxylic acid is selected from: 
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CH3 

Many enzymes for effecting these reactions are well known in the art, and are 
suitable for use in the construction of useful polypeptides and host strains. To achieve the 
initial oxidation of the methyl groups, certain enzymes are presently preferred, including 
5 non-heme multicomponent monooxygenases of toluene and xylenes, and ;?-cymene, as well 
as certain arene dioxygenases which act on these substrate in a monooxygenase mode. The 
latter are exemplified by naphthalene dio..^ genase, 2-nitrotoluene 2J-dioxygenase and 2,4- 
dinitrotoluene 4,5-dioxygenase, These dioxygenases do not oxidize the aromatic ring of 
methylbenzenes, but are capable of oxidizing methyl groups of a variety of 
10 aromaticcompounds in a monooxygenase mode (Selifonov, et aL, Appl Environ. Microbiol 
62(2):507-5i4 (1996); Lee etal.Appl Environ. Microbiol 62(9):3101-3106 (1996); 
Parales, et al, IBacteriol 180(5):1 194-1 199 (1998); Suen et al, J.Bacteriol 178(16):4926- 
4934 (1996). As with the other polypeptide activities discussed herein, the ability of a 
dioxygenase to act as a monooxygenase is a property that can be optimized by shuffling the 
1 5 nucleic acids encoding these dioxygenases. 

The following list provides examples of polynucleotides that encode 
dioxygenases acting as monooxygenases and which are suitable for use in the methods of the 
invention. The loci are identified by GenBank ID and encode complete or partial protein 
components of the arene dioxygenases. Suitable loci include: 
20 [AB004059], [AF010471], [AF036940], [AF053735], [AF053736], 

[AF079317], [AF004283], [AF004284], [PSENAPDOXA], 
[PSENAPDOXB], [PSENDOABC], [PSEORFl], [PSU49496] naphthalene- 
1,2-dioxygenase; [BSU62430] 2,4-dinitrotoluene dioxygenase; [PSU49504] 
2-nitrotoluene dioxygenase. 
25 The polypeptide that catalyzes the monooxygenation can be a naturally 

occurring polypeptide, or it can have one or more properties that are improved relative to an 
analogous naturally occurring polypeptide. In a preferred embodiment, the polypeptides are 
expressed by one or more host organisms. Moreover, the polypeptide that catalyzes the 
monooxygenation can be co-expressed by the same host expressing a polypeptide used for 
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further structural elaboration of the oxidation substrate or product {e.g. , a dioxygenase 
polypeptide that oxidizes the 71-bond). Alternatively, the mono- and di-oxygenase 
polypeptides can be expressed in different hosts. 

5 (ii). Oxidation of alkylarenes having alkyl groups with > C2 

While much of the discussion above highlighting pathway and organism 
I construction for oxidation of methylbenzenes is directly applicable to the set of processes 

dealing with alkyl benzenes bearing other alkyl groups. 

Thus, in a preferred embodiment, at least one alkyl group of the alkylarene 
1 1 0 has at least two carbon atoms. Preferred species produced in the monoxygenation step (and 

any subsequent structural elaboration) have the structure: 

{R)m 

^J^{CH2)n-C02H 

I wherein each of the m R groups is selected from H, alkyl, substituted alkyl, aryl, substituted 

aryl, heteroaryl, substituted heteroaryl, heterocyclyl and substituted heterocyclyl; m is a 
15 number from 0 to 5, inclusive; and n is a number from 1 to 10, inclusive. Preferred aryl 
groups are those substituted on the aryl group with at least one methyl moiety. 

In another preferred embodiment, the compound has the structure: 



O- 



■(CH2)n-C02H 

wherein n is a nimiber from 1 to 6, inclusive. 

20 Generally, oxidation of C2 alkyl groups is best accomplished by expressing a 

suitable cytochrome P450 type enzyme system. The enzymes of this class are ubiquitous in 
nature, and they can be found in a variety of organisms. For example, n-prcrylbenzene is 
known to undergo □ -oxidation in strains of Pseudomonas desmolytica S449B1 and 
Pseudomonas convexaS107Bl(Jigami etal, Appl Environ. Microbiol, 38(5):783-788 

25 ( 1 979)) which can utilize this hydrocarbon in either of two alternative oxidation pathways. 

Similarly, well known in the art, alkane monooxygenases of bacterial origin, 
or cytochromes P450 for camphor oxidation, whether wild-type or mutant, can be recruited 
for the purpose of introducing the oxygen at the terminal methyl group of alkylarenes (Lee et 
al, Biochem. Biophys. Res, Commun. 218(1):17-21 (1996); van Beilen a/., Mol. 

87 



NSOOCID: <:WO__.0009682A1J > 



wo 00/09682 



PCT/US99/18424 



Microbiol 6(21):3121-3136 (1992); Yioketai, 1 Biol Chem. 264(10):5435-5441 (1989); 
Kok et ai, 1 Biol Chem, 264(10):5442-5451 (1989); Loida et al. Protein Eng. 6(2):207- 
212(1993). 

^ 5 (Hi) Oxygenation of arenes with exocyclic n-bonds 

In another preferred embodiment, the starting material for the carboxylic acid 

% is an arene bearing an exocyclic K-bond. This class of compounds is exemplified by styrene. 

Other analogous species are set forth in Fig. 3. 

The conversion of the exocyclic 7t-bond is best accomplished by recruiting a 

■ 10 cluster of bacterial styrene oxidation genes well known in the art (Marconi et al, Appl 

Environ. Microbiol 62(1): 121-127 (1996); Beltrametti et al, Appl Environ, Microbiol 
63(6):2232-2239 (1997); O'Connor et al, Appl Environ, Microbiol 63(1 1):4287-4291 
(1997); Velasco etal, 1 Bacteriol 180(5):1063-1071 (1998); Itoh, etal, Biosci, 
Biotechnol Biochem. 60(1 1):1826-1830 (1996). Alternatively, the styrene epoxidation step 

J 15 can be accomplished by using monooxygenases active towards methyl substituted aromatic 

I compounds, such as toluene or xylenes (Wubbolts, et al, Enzyme Microb, Technol 

16(7):608.615 (1994). 

(iv). Dehydrogenation 

20 To produce the desired carboxylic acid, the alcohol from (i-iii), above, is 

preferably treated with a dehydrogenase polypeptide. The dehydrogenase enzymes can be 
endogenous to a host that expresses one or more of the oxygenase polypeptides, or it can 
exhibit properties that are improved relative to an endogenously expressed dehydrogenase. 
The polypeptide that catalyzes the dehydrogenation can be a naturally 
25 occurring polypeptide, or it can have one or more properties that are improved relative to an 
analogous naturally occurring polypeptide. In a preferred embodiment, the polypeptides are 
expressed by one or more host organisms. Moreover, the polypeptide that catalyzes the 
dehydrogenation can be co-expressed by the same host expressing one or more of the 
dioxygenase polypeptide. Alternatively, the dehydrogenase and oxygenase polypeptides can 
30 be expressed in different hosts. 

In yet another preferred embodiment, the invention provides a method for 
altering or controlling the regiospecificity of the dehydrogenation reaction of a vicinal diol. 
This method "blocks" one of the vicinal diol hydroxyl groups by forming an ester, f^r 
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iple. The method includes contacting the vicinal did with a polypeptide, preferably 
expressed by a host organism, having an activity selected from ligase, transferase and 
combinations thereof, thereby forming a a-hydroxycarboxylic acid adduct. As with the 
other polypeptides discussed above, this polypeptide can be expressed by the same host ceil 
that expresses other polypeptides of the reaction cascade. Moreover, this polypeptide can be 
a naturally occurring polypeptide, or it can be improved using the method of the invention. 

b. Monooxygenation of aromatic n-bonds 

In the synthesis of hydroxyaryl carboxylic acids using the methods of the 
invention, once the carboxylic acid moiety is in place, the molecule is submitted to an arene 
monooxygenation cycle (Fig. 1). The monooxygenation of the aromatic ring is preferably 
accomplished by recruiting one or more monooxygenase genes, preferably of bacterial 
origin. Exemplary monooxygenase genes are disclosed herein. The method of the invention 
can be practiced using essentially any type of aromatic ring system. Exemplary aromatic 
systems include, benzenoid and fused benzenoid ring systems {e.g., benzene, napthalene, 
pyrene, benzopyran, benzofiiran, etc.) and heteroaryl systems (pyridine pyrrole, fiiran, etc.). 
In a preferred embodiment, the substrate includes a benzenoid hydrocarbon. 

Similar to the embodiments discussed above, in this embodiment, the 
polypeptide that catalyzes the monooxygenation can be coexpressed vnth one or more 
polypeptides used in a synthetic pathway. For example, the monooxygenase, dehydrogenase 
and transferasease polypeptides can all be coexpressed in a single host. Other functional 
combinations of coexpression will be apparent to those of skill in the art. 

3. Conversion of hydroxyls and/or acids to esters 

In another preferred embodiment, there is provided a method for converting 
carboxylic acid and hydroxyl groups to -dducts such as esters and ethers. Useful 
polypeptides include ligases and transferases {see. Fig. 4). For the purposes of the 
discussion below, these polypeptides are referred to as "adduct-forming" polypeptides. 

The adduct-forming polypeptides are useful for enhancing the production of 
biotransformation products. These polypeptides, which convert a diol, for example, to a 
monoacyl or monoglycosyl derivative, can enhance control over the regioselectivity of 
subsequent reactions (e.g., chemical dehydration). For example, the regioselectivity of 
chemical dehydration in certain cases can be controlled by converting the compounds to 
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their diacyl derivatives by means of chemical reaction, and then selectively removing one of 
the acyi groups using an polypeptide of the invention. Ahematively, one can control the 
regioselectivity of the dehydration by using an esterase or a trans-acylase polypeptide to 
convert the compounds to monoacyl derivatives in the presence of an excess of another 

5 carboxy lie acid ester, in an essentially organic medium. In addition, acylation of diols, for 
example, to obtain monocarboxylic esters provides advantages for efficient recovery of such 
esters by means of organic solvent extraction, including by extraction with organic solvents 
which may be used in an immiscible biphasic organic-aqueous biotransformation with whole 
cells, whether in a batch or in a continuous mode. 

^0 An adduct-forming polypeptides can be expressed by the same host cell that 

expresses the monooxygenase, dehydrogenase, racemase, etc., or it can be expressed by a 
different host cell. Moreover, an adduct-forming polypeptide can be a naturally occurring 
polypeptide, or it can be improved by the method of the invention. 

When the adduct-foiming polypeptide is an improved polypeptide, in 

1 5 presently preferred embodiments, the polypeptides can, for example, demonstrate increased 
efficiency in the formation of the monoacyl- or monoglycosyl- derivatives of a desired 
compound (e.g., a glycol, carboxylic acid, etc.). Other improved adduct-forming 
polypeptides include transferases and ligases that can selectively modily only one of the 
hydroxyl groups of a diol, thus providing a means for control of regioselectivity of 

20 dehydration of such derivatives to either of two possible isomeric a-hydroxycarboxylic acid 
compounds. 

4. Conversion of fatty acids to hydroxy acids 

In another preferred embodiment, there is provided a method for converting 
25 fatty (preferably, alkanoic, n=3-20) acids to hydroxy acids. Monooxygenases are well 

known to those skilled in the art to perform the oxidation of remote carbons in a fatty acid. 
Improved polypeptides will have selectivity for the oxidation of any position in the chain. 
These hydroxyacids can then be used as substrates for polymer formation. 

30 D. Antioxidant and Impurity Modification and Detoxification 

In another embodiment, the invention provides a means for degrading or 
modifying organic materials which leads to their detoxification. Exemplary compounds 
include stabilizing agents, antioxidizing agents, environmental pollutants and the like. This 
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method is applicable to substantially any compound that can be detoxified by, for example, 
oxidation, either with or without additional structural elaboration. For clarity of illustration, 
the discussion below focuses on the detoxification of agents commonly found in organic 
solvents and in rr-bonded compounds of use in the present invention. 
5 Many commercially available compounds {e.g. , alkylbenzenes, alkenes, etc.) 

are stabilized with small amounts of antioxidants such as 4-rer/-butylcatechol or 
I alkylphenols (e.g. BHT) to prevent polymerization during storage and transportation. While 

the amount of these compounds is usually relatively small (10-15 ppm), they can inhibit 
biocatalyst performance as they accumulate in aqueous fermentation medium during 
1 0 prolonged incubations required to obtain satisfactory endproduct concentrations. 

Several types of enzymes for modifying the phenolic stabilizing compounds 
can be used to alleviate any negative effects of these compounds on the whole cell 
biocatalyst performance. Their genes can be introduced in the same host organism used to 
produce endproducts or intermediate of relevance to his invention. Altematively, they can 
I 1 5 be incorporated into a separate host organism. This obviates the need for additional steps in 

^ the process which may be required in order to remove these stabilizers. Optimization of one 

or several of these enzymes for the efficient removal of these stabilizing compounds is a 

target for DNA shuffling. 

Exemplary enzymes for modifying phenolic and diphenolic stabilizers 

20 include, but not limited to, acyltransferase, methyltransferase, glycosyltransferase, lactase 
and peroxidase. In addition to these enzymes, catecholic stabilizers also can be modified to 
innocuous products by catechol dioxygenases effecting meta- or ortho-ring cleavage. Many 
of these enzymes show a significant breadth of activity towards compounds related to 
phenolic stabilizers. Thus, DNA shuffling can be applied to optimize enzyme parameters 

25 such as: 

J) increased turnover with particular phenolic stabilizer, 
b) increased functional expression, by obviating the requirements for certain 
I post-transitional modifications of those enzymes which require such modifications (e.g. 

glycosylation of peroxidases and lactases); and 
30 c) alleviation of inhibition of these enzymes by high concentration of co- 

occurring feedstock compounds and intermediates and endproducts of the biocatalytic 
process. 
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E. Analytical Methodology 

A number of analytical techniques are useful in practicing the present 
invention. These analytical techniques are used to measure the extent of conversion of a 
particular substrate to product. These techniques are also used to analyze the regioselectivity 
^ 5 and/or the enantiomeric selectivity of a particular reaction catalyzed by a polypeptide of the 

■ ^ invention. Moreover, these techniques are employed to assess the effect of nucleic acid 

;| shuffling experiments on the efficiency and selectivity of the polypeptides produced 

following the shuffling. The discussion below focuses on those aspects and embodiments of 
the invention in which an olefin precursor is oxidized by a monooxygenase. The analytical 
*: 1 0 tech..:ques discussed in this context are generally of broad applicability to other aspects and 

embodiments of the invention. This is particularly true of the spectroscopic and 
chromatographic methods discussed below. Thus, in the interest of brevity, the following 
discussion focuses on analyzing the products of the oxidation of an olefin, but the utility of 
the methods discussed is not limited to this embodiment. 



15 



Selecting for Monooxygenase activity 

Monooxygenase activity can be monitored by HPLC, gas chromatography 
and mass spectroscopy, as well as a variety of other analytical methods available to one of 
skill. The consumption of molecular oxygen by the monooxygenase can be measured using 

20 an oxygen sensing system, such as an electrode. Incorporation of '*0 from radio-labeled 
molecular oxygen can be monitored directly by mass shift by MS methods and by an 
appropriate radioisotope detector with HPLC and GC devices. For example, epoxidation of 
1-hexadecene to 1 ,2-epoxyhexadecene can be monitored by '*0 incorporation either in intact 
whole cell or lysate. This has been used, for example by Bruyn et al with Candida 

25 lipolytica. 

In addition, epoxide formation can be indirectly measured by various reactive 
colorimetric reactions. When H2O2 is used as the oxidant, disappearance of peroxide over 
time can be monitored directly either potentiometrically or colorimetrically using a number 
of commercially available peroxide reactive dyes. 

In a high-throughput modality, the method of choice is high-throughput MS, 
or MS with an electron spray-based detection method. In addition, selection protocols in 
which the organism uses a given alkane, alkene or epoxide as a sole carbon source can be 
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used. In some systems this will be most readily accomplished by combining the alkene 
oxidizing polypeptide with an epoxide hydrolase to generate a metabolizable alcohol. 

2. Automation for Strain Improvement 
5 One key to strain improvement is having an assay that can be dependably 

used to identify a few mutants out of thousands that have potentially subtle increases in 
g product yield. The limiting factor in many assay formats is the uniformity of library cell (or 

I viral) growth. This variation is the source of baseline variability in subsequent assays. 

Inoculum size and culture environment (temperature/humidity) are sources of cell growth 
10 variation. Automation of all aspects of establishing initial cultures and state-of-the-art 
temperature and humidity controlled incubators are useful in reducing variability. 
In one aspect, library members, e.g., cells, viral plaques, spores or the like, are separated on 
solid media to produce individual colonies (or plaques). Using an automated colony picker 
(e.g., the Q-bot, Genetix, U.K.), colonies are identified, picked, and 10,000 different mutants 
15 inoculated into 96 well microtitre dishes containing two 3 mm glass balls/well. The Q-bot 
does not pick an entire colony but rather inserts a pin through the center of the colony and 
exits with a small sampling of cells, (or mycelia) and spores (or viruses in plaque 
applications). The time the pin is in the colony, the number of dips to inoculate the culture 
medium, and the time the pin is in that medium each effect inoculum size, and each can be 
20 controlled and optimized. The uniform process of the Q-bot decreases human handling error 
and increases the rate of establishing cultures (roughly 10,000/4 hours). These cultures are 
then shaken in a temperature and humidity controlled incubator. The glass balls in the 
microliter plates act to promote uniform aeration of cells and the dispersal of mycelial 
fragments similar to the blades of a fementer. 

25 

a. Prescreen 

The ability to detect a subtle increase in the performance of a shuffled library 
member over that of a parent strain relies on the sensitivity of the assay. The chance of 
finding the organisms having an improvement is increased by the number of individual 
30 mutants that can be screened by the assay. To increase the chances of identifying a pool of 
sufficient size, a prescreen that increases the number of mutants processed by 10-fold can be 
used. The goal of the primary screen will be to quickly identify mutants having equal or 
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better product titres than the parent strain(s) and to move only these mutants forward to 
liquid cell culture for subsequent analysis. 

In one preferred embodiment, the prescreen for P450 activity is a method for 
measuring functional heme incorporation. Active P450 monooxygenases have an 
5 absorbance at around 450 nra in the presence of carbon monoxide in a reducing 

environment. Thus expression of the P450 library on an agar plate is followed by the 
addition of a reducing solution, such as dithionite in water. This solution is then removed 
and the plate is placed in a CO atmosphere. Colonies with increased absorbance at 450 nm 
are picked as active cytochrome P450 enzymes. This screening process is general for all 
10 P450 monooxygenases. 

3. Selection for Redox Partners 

One target for the application of gene shuffling technologies is to evolve 
monooxygenases to use cheaper, more practical redox partners. However, the complexities 
1 5 of managing redox equivalents can be circumvented, in many cases, by using peroxides 
I (such as hydrogen peroxide) as co-substrates. For example, a monooxygenase capable of 

5 oxidizing 1 -octene to 1 ,2-epoxyoctane does so in a non-NAD(P)H-dependent manner when 

H2O2 is added to the reaction mix. For peroxidases and chlorperoxidases this peroxide- 
dependent, NAD(P)H-free oxidative chemistry is the norm. Peroxide-mediated oxidations, 
20 however, often result in the rapid inactivation of catalytic activity by a variety of partially 
understood mechanisms enzymes {see. Cytochrome P450: structure, mechanism, and 
BIOCHEMISTRY [2nd edition], P.R. Ortiz de Montellano, editor, New York: Plenum Press, 
chapter 9; and Meunier, B. Chem. Rev. 92:141 1-1456 (1992)). Enhancing the stability of 
P450 enzymes in the presence of peroxides and increasing the overall turnover rates of these 
25 enzymes with basic industrial raw materials is a feature of the invention. 

Gene shuffling offers a means of generating new peroxidase and oxygenase 
polypeptides wim altered selectivity, activity or stability. Whereas peroxides are often 
prohibitively expensive for use as oxidants for industrial chemistry, biological systems offer 
the potential to generate and use peroxides in situ without isolation of the reactive 
30 intermediates. The concepts disclosed here include the coevolution of a hydrogen peroxide- 
generating system (such as glucose, galactose or alcohol oxidases) with a monooxygenase 
polypeptide capable of using the peroxide generated to synthesize an oxidized coproduct. In 



94 



BNSDOCID: <WO. 



0009682A1_L> 



wo 00/09682 PCT/US99/18424 

this context, peroxides can be commercially feasible oxidizing agents for even low-value, 
high-volume commodity chemicals. 

4. Screening for improved monooxygenase activity. 

In each of the aspects and embodiments discussed below, the concept of 
screening the library of recombinant polypeptides to enable the selection of improved 
member s of the library is set forth. Although it will be apparent to those of skill in the art 
that many screening methodologies can be used in conjunction with the present invention, 
the invention provides a screening process comprising: 

(a) introducing the library of recombinant polynucleotides into a 
population of test microorganisms such that the recombinant polynucleotides are expressed; 

(b) placing the organisms in a medium comprising at least one substrate; 

and 

(c) and identifying those organisms exhibiting an improved property 
compared to micrx)organisms without the recombinant polynucleotide. 

a. Oxidation of olefins 

Depending on the specific outcome desired from a particular course of DNA 
shuffling of nucleic acids encoding oxygenases for biocatalytic oxidation of olefins, the 
invention provides several methods for detecting and measuring catalytic properties encoded 
by the recombinant polynucleotides. These are exemplified by the following methods. 

For the purpose of the optimization of individual reactions and whole 
pathways for production of a-hydroxycarboxylic acids, their derivatives, analogues and 
precursor compounds described in this invention can be monitored by virtually any analytic 
technique known in the art. In preferred embodiments, the production of the desired 
compound is monitored using one or more techniques selected from thin layer 
chromatography (TLC), high performance liquid chromatography (HPLC), chiral HPLC, 
mass-spectrometry, mass spectrometry coupled with a chromatographic separation modality, 
NMR spectroscopy, radioactivity detection from a radioactively labeled compounds (e.g., - 
olefins, diols, aldehydes, AHAs, etc.), scintillation proximity assays, and by UV- 
spectroscopy. In a high throughput modality, the preferred methods are selected from one or 
any combination of these methods. 
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The methods of the invention are used to improve polypeptides that catalyze 
the initial oxidation of n-bonded species. Methods using monooxygenase-based pathways 
are encompassed herein. The oxidation product from the conversion of a substrate 
comprising a 7i-bond {e.g., arenes, alkylarenes, alkenes, etc.) can be detected by numerous 
methods well known to those of skill in the art. Certain preferred methods are set forth 
herein. 

In a preferred embodiment, the vicinal diol derived from oxidation of an 
olefin is quantitated using a radioactively labeled substnte. Although any radioactive 
isotope commonly used in the art can be incorporated into a substrate, preferred isotopic 
labels include, for example, '"C and/or n. Differences in the volatility of the olefin 
substrate and the corresponding diol can be exploited to quantitate the radioactively labeled 
product. This method can easily be appliea to aqueous samples of culture fluids obtained by 
incubating individual clones of cells expressing libraries of a recombinant polynucleotide 
obtained using the methods of the invention. 

In an exemplary embodiment, cells expressing libraries of recombinant 
polynucleotides encoding a monooxygenase can be grown in a multiwell dish with a 
radioactive substrate administered directly to the aqueous medium. After incubation of the 
cells with the radioactive olefin substrate, any residual uncoverted substrate is removed by 
evaporation, with or without application of vacuum. After removing the unconverted 
substrate, the culture fluid (or aliquots thereof) is mixed with a suitable scintillation cocktail, 
and the radioactivity in the samples is quantitatively measured. In a preferred embodiment, 
selection of the most active clones is based on the amount of radioactivity incorporated into 
the compounds produced by the organisms expressing the clone. 

Alternatively, radioactively labeled substrate can be administered as a vapor 
phase to colonies growing on a surface of a membrane filter overlaying agar-solidified 
medium. After incubation, the membrane is removed from the agar surface, -nd any residual 
hydrocarbon is evaporated from the membrane. The membrane is autoradiographed, or a 
scintillation dye is sprayed over the membrane for radioactivity detection. A modification of 
this assay that is particularly suitable for '"C label detection in and/or around colonies 
capable of oxidizing 7t-bonds to the corresponding glycols involves using a porous 
membrane that has scintillation dye incorporated in the membrane composition by covalent 
or adsorption means. This assay is termed "scintillation proximity assay on membrane" or 
"SPA." 
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In another embodiment of this invention, a variation of SPA is used to 
selectively quantify the glycol derived from the substrate. This variation involves adding 
beads for scintillation proximity assay to the samples of culture fluids or extracts obtained by 
incubation of cells with radiolabeled substrate as described above. Alternatively, the sample 
can be applied to a membrane. The beads or membrane are functionalized with groups that 

interact with a glycol. 

In a preferred embodiment of this assay, the beads or membranes contain a 
suitable scintillating dye and their surfaces are modified by chemical groups that interact 
readily with diols. Such materials can be prepared by known chemical methods from 
commercially available SPA materials iind they can be used to trap free diols directly in the 
aqueous medium or culture broths obtained by incubation of the microbial cells with the 

radiolabeled substrates. 

In another preferred embodiment, the surface of the beads used in this assay 
is functionalized with a sufficient amount of a compound that interacts with a glycol, such as 
compounds containing aryl or alkylboronate (boronic acid). Such beads can be obtained by 
chemical modification of commercially available SPA beads by reactions known to one 
skilled in the art. In a preferred embodiment, the reactions used to modify the beads are 
analogous to those used for the preparation of arylboronate-modified resins for solid-phase 
extraction or chromatography. After incubation, the beads are washed wdth a sufficient 
amount of water or other suitable solvent and subjected to quantitative determination of 
radioactivity. 

One can also determine amounts of glycol produced by oxidation of an n- 
bond by taking advantage of the reactive nature of the substrate. Samples of culture fluids, 
or extracts in an appropriate solvent, can be treated with known excess amounts of dilute 
solutions of, for example, a halogen (CI2, Br^, I2), permanganate salts. The residual excess 
amount of those rea^./its, left after reaction with any substrate present, can be measured by 
chemical methods known in the art for determination of these compounds {see, for example, 
VOGEL's Practical Organic Chemistry 5"*" Ed, Fumiss et ai, Eds., Longman Scientific 

and Technical, Essex, 1989). 

Mass spectrometry can also be used to determine the amount of a vicinal 
glycol formed due to species encoded by the libraries of shuffled oxygenase genes. Mass 
spectrometric methods allow ion peaks to be detected. The ion peaks derived from the 
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vicinal glycol can be readily distinguished from peaks derived from olefm substrates. In a 
preferred embodiment, coordination ion spray or electrospray mass spectrometry is utilized. 

In another preferred embodiment, a compound that interacts with a 
component of the mixture, preferably the glycol, is utilized to enhance the sensitivity and 
5 selectivity of the method. In a presently preferred embodiment, the sample analyzed 
contains excess arylboronic or alkylboronic acid. Preferred boronic acids are those 
containing at least one nitrogen atom and include, but are not limited to, 
dansylaminophenylboronic acid, aminophenylboronic acid, pyridylboronic acid. 

The ions detected in the mass spectrum derive from cyclic boronate ester 
1 0 derivatives of the glycols with a boronic acid. The samples are preferably analyzed in non- 
acidic and non-basic organic solvent or aqueous phase, substantially free of alcohols and 
other glycols. Other appropriate analytical conditions will be apparent to those of skill in the 
art. 

Another preferred method for quantitating the glycols uses periodic acid or its 
1 5 salts, preferably the sodium salts, to cleave the vicinal glycols to the corresponding 
aldehydes. In a preferred embodiment, vicinal diols other than the analyte (e.g. , 
carbohydrates) are excluded from the aqueous or organic solvent samples. This is easily 
attained by using non-carbohydrate carbon sources to grow the microbial cells, and/or by 
removal of the cells from the media by centriftigation or filtration prior to contacting of the 
20 sample with periodate reagent. The periodate reagent can be used in solution, or preferably, 
immobilized on a solid phase {e.g. anion exchange resin). After reacting the glycol with an 
excess of periodate ion, the amount of free aldehyde groups can be measured by a variety of 
assays know in the art. In a preferred method, the aldehydes are quantitated by a method 
based on the formation of a colored hydrazone derivative. Alternatively, when using 
25 radioactively labeled olefins for biotransformation, the free aldehydes obtained by this 

method can be trapped by aldehyde reactive groups (e.g., free amines) on the surface of an 
appropriately modified SPA beads or membranes. 

(ii). Methods for detecting alternative regioselectivity of oxidation of species 
30 with multiple n-bonds 

In one embodiment, the substrate includes more than one jt-bond (e.g., 
styrene, butadiene, etc.). In a preferred embodiment, one of the i:-bonds undergoes reaction 
readily than the other. In this embodiment, it is generally preferred to determine which 
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of the n-bonds underwent reaction. The preferred method for making this determination is 
'H or '^C NMR, although other methods can be used. Other methods include, for example, 
chromatography (e.g., TLC, GC, HPLC, etc.), UV/vis spectroscopy and IR spectroscopy. In 
an embodiment wherein the reaction is operating in a high throughput mode, the method of 
choice is a flow-through 'H or '^C NMR spectroscopy. 

When '"'C NMR is used, the substrates are preferably labeled with '^C. Jt- 
bonded species can be synthesized by methods know in the art from a '^C enriched material 
to incorporate one, or any combination of several, labeled carbon atom(s) into the structure 
of these compounds. The enrichment levels for the labeled positions are preferably at least 
5% of ''C, more preferably 50% and more preferably still 95% for any given labeled 
position. Incorporation of a '^C label provides a number of advantages, such as increasing 
the NMR signal and decreasing time required for spectral acquisition. Moreover, labeled 
compounds allow for a quantitative or semi-quantitative interpretation of the composition of 
a mixture of isomeric oxidation products. Preferably, incubations with '^C labeled olefins 
are conducted in multi-well plates, and aliquots of culture fluids or their extracts are sampled 
with an autosampler communicating with the NMR probe. In another preferred 
embodiment, the reaction components are not chromatographed or otherwise purified prior 

to obtaining a NMR spectrum. 

Determining the absolute configuration and the enantiomeric composition of 
the glycols formec' from n-bonded species, preferably employs a variation of the method 
described above for determining regioselectivity of dihydroxylation of the olefinic substrates 
by a monooxygenase using 'H or '^C NMR. In a preferred embodiment, the substrates are 
labeled with ''C and '^C NMR, is employed. This method preferably involves the use of a 
chiral and essentially enantiomerically pure derivatizing reagent such as a substituted 
arylboronic acid which forms a cyclic boronate derivatives with vicinal glycols, as know in 
the art (references: Resnick, Gibson, 1997, cite). In a preferred embodiment, both the 
substrates and one or more carbon atoms of the boronic acid is labeled with '^C. Although a 
broad range of boronic acids are of use in the present invention, a cunently preferred boronic 
acid is shown below: 
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The absolute configuration of any chiral center of the compounds produced 
by the methods of the invention can be either R or S. In presently preferred embodiments, 
the enantiomeric excess of the product is preferably 98% or more. NMR signals of different 
enantiomers of the reaction products can be distinguished in diastereomeric products using 
substantially enantiomerically pure boronate compounds as discussed above. Moreover, the 
relative intensity of the NMR signals arising from corresponding atoms of the diastereomeric 
products can be used for estimating the enantiomeric composition of the product(s) present 
in the sample. 

(Hi). Methods for detecting alternative regioselectivity of oxidation of 
alkylarenes 

Useful methods for determining the regioselectivity of the oxidation of 
alkylarene compounds are substantially similar to those described in section (ii), supra. 

2. AHA formation from glycols 

Among methods for specifically measuring the free AHAs produced in the 
biocatalytic process, those which are particularly preferred are methods using a variation of 
the scintillation proximity assay described above. These methods preferably use an excess 
of beads or membranes bearing one or more positively charged functional groups {e.g 
quaternary or tertiary or primary amines). In preferred embodiments, these beads or 
membranes act as an anion exchange medium and they selectively trap firee AHAs, thereby 
removing them from aqueous culture bmths. In another preferred embodiment, this method 
employs a raaioactively labeied starting material, or subsequent intermediate, {e.g.. glycol, 
epoxide, etc.). The radioactively labeled compound interacts with the beads or membrane. 
Prior to measuring the radioactivity associated with the beads or the membrane, non- 
specifically adsorbed label is preferably removed by evaporating excess radioactive 
compound and/or washing with an aqueous solution which does not cause elution of the 
AHAs from the anion-exchange beads or membrane. 

Preferred methods for determining the chirality and absolute configuration of 
AHAs formed in the described biotransformation process are substantially similar to those 
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methods employed in making these determinations with respect to the glycols, as discussed 
above. 

i. Methods for determination of MCAs 
5 In HTP mode, a preferred analytical method is flow-through *H or NMR 

spectroscopy. In the ^^C NMR mode, the aromatic substrate for oxidation by a 
I monooxygenase is preferably labeled by the ^^C isotope. Alkylaryl compounds or the 

corresponding arylalkanoic acids are synthesized by methods know in the art from a ^^C 
enriched material to incorporate one, or a^y combination of several, labeled carbon atom(s) 
10 into the structure of these compounds. The enrichment levels for any labeled position are 
preferably at least 5% of *^C, and more pref'-rably at least 95%. Incorporation of ^"^C label 
increases sensitivity of the NMR measurement, decreases time required for acquisition of 
spectrum per sample, and allows for quantitative or semi-quantitative interpretation of 
compositions of mixtures of isomeric oxidation products. Preferably, incubations with ^^C 
% 1 5 labeled precursors are conducted in multi-well plates, and aliquots of culture fluids or their 

extracts are sampled with autosampler connected to the solvent line passing through NMR 
probe without any column separation. 

For determining absolute configuration and enantiomeric composition of the 
HCAs, a variation of the methods described above for determining reaction regioselectivity 
20 by or ^^C NMR is used. In conjunction with the preferred use of ^^C labeled substrates, 

^^C NMR is preferably employed. 

The absolute configuration of any chiral center may be either R or S. In a 

preferred embodiment, the enantiomeric excess is 98% or more. NMR signals of different 

enantiomers of HCAs can be distinguished in diastereomeric products using known methods, 
^ 25 such as NMR in conjunction with lanthanide shift reagents - or after derivatl^tion with 

Mosher's esters. Alternatively the enantiomeric excess can be determined by chiral GC. 
,| In another preferred embodiment, a variation of the SPA method is used. In 

this version, a solid support, such as beads or a membrane containing a suitable scintillation 

dye is used. The solid support is modified with positively charged groups such that it acts 
30 like an anion-exchange material. These materials can be prepared from commercially 

available SPA materials and they can be used to trap free acids directly in the aqueous 
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medium or culture broths obtained by incubation of the host cells with a radiolabeled 
alkylarene. 



4. Methods for determination of esters 
^ In the interest of brevity, the following discussion focuses on the 

determination of esters of AHAs. One of skill will appreciate that the same, or similar, 
^ methods can be used to determine esters of other compounds fomied using the methods of 

'^^ the invention. 

Both spectroscopic and non-spectroscopic methods can be used to quantitate 
" 1 0 the extent of ester synthesis and to charactei ..e the esters. The preferred non-spectroscopic 

method for assaying AHA methyl ester formation catalyzed by methyl transferases is based 
on use of a radioactively labeled precursors to AHA methyl esters. '*C or 'H methyl labeled 
SAM (or its in-vivo precursor, methionine) can be used as a probe. In another preferred 
embodiment, the labeled substrate is the free a-hydroxycarboxylic acid itself 
J 5 Using the methods of the invention, methyltransferases that are selective for a 

I particular AHA enantiomer can be selected and further improved by iterative cycles of DNA 

shuffling and this assay. The selectivity of the methyltransferases of the invention towards a 
particular enantiomeric configuration of an AHA is preferably measured using samples of 
the a-hydroxycarboxylic acids that are substantially enantiomerically pure. Host cells 
20 employed in this biocatalytic cycle will preferably lack AHA racemase activity (e.g. 

mandelate racemase). In another preferred embodiment, both AHA enantiomers have a 
different radioactive label, e.g. one enantiomer is labeled with '"C, and another with 'H (at 
one or more H positions which do not readily exchange with water). Measurement of the 
radioactivity incorporated into the product is performed using a radioactivity detector that 
25 allows for the selective measurement of at least two different isotopes. This variation allows 
* the evaluation of the enantioselectivity of a methyltransferases in a single sample. 

The radioactivity associated with methyl esters of AHAs is preferably 
1 measured in samples which are obtained by selective extraction or partitioning of the methyl 

esters from neutral or moderately basic (pH about 6-10) aqueous culture samples. These 
30 samples can contain varying amounts of free, labeled AHA, of AHA salts and other non- 
labeled organic compounds. The samples are preferably obtained by incubating individual 
clones expressing methyltransferase libraries with the labeled AHAs. The incubation 
medium is subsequently extracted by a adding a defined amount of a preferably water- 
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immiscible organic solvent, or by contacting the broth with a extraction medium (e.g. XAD- 
11 80, or similar beads, or membrane). 

In those embodiments employing an extraction medium, following its 
removal from contact with the broth, the extraction media is preferably washed to remove 

^ 5 adventitiously bound compounds. Preferred wash solutions are aqueous that do not elute the 

AHA methyl esters from the extraction medium, but which remove other molecules 

I adsorbed onto the medium. The radioactivity of the extracted material is then measured by 

^ methods well known in the art. In embodiments using beads or a membrane an appropriate 

scintillating dye is preferably used for detecting the radioactivity. 

' * 10 Substantially similar methods can also be employed for detecting other 

neutral esters of AHAs, such as those exemplified by glycolides (e.g., XVI, Fig, 13) and 
esters of type XX. Thus the same approach is useful for assaying and characterizing the 
ester forming activity of polypeptides represented by libraries of acyl-transferases, or by a 
combination of AHA-CoA: alcohol acyltransferases and AHA-CoA ligases. Variations on 

^ 1 5 this method can include the use of a radioactively labeled alcohol (e.g , XIX) or any of its in- 

^> vivo metabolic precursor. 

In another preferred embodiment, the method for detecting polypeptide 
activity leading to the formation of neutral AHA esters employs UV or fluorescence 
spectroscopy. This method is applicable to those embodiments in which the transferase 
20 activity yields products exhibiting distinct UV and/or fluorescent characteristics. Exemplary 
compounds include, for example, substituted or non-substituted esters of aromatic carboxylic 
acids (e.g., mandelic acid). In preferred embodiments of this method, a solvent or solid- 
phase extraction under neutral or moderately basic conditions (pH about 6-12) is performed 
on the cell culture medium. Compounds thus isolated are detected by measurement of their 
25 UV absorption or fluorescence. These spectral parameters are evaluated to determine 
relative amounts and identities of ihe products formed by the transferase reacuons. 

;| a. Screening for improved transferas e activity 

The screening of the transferase libraries, obtained by DNA shuffling or other 
30 methods as described above, is done most easily in bacterial or yeast systems by one or more 
of the screening methods described below. 
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(i). Methods for detecting increased activity of transferase reactions 
The methods for detection of increased fonnation of monoacyl- and 
monoglycosyl-derivatives of, for example, glycols and a-hydroxycarboxylic acids include 
methods in which physical differences between the substrates, the czj-diols and the 
5 derivatives arising from the transferase-catalyzed reactions are measured. Preferred methods 
include HPLC and mass-spectrometry. In a high throughput modality, a method of choice is 
I mass-spectrometry, preferably, coordination ion and/or electrospray mass-spectrometry. 

i For acyl transferases, another presently preferred method uses a labeled acyl- 

donor precursor, e.g. labeled carboxylic acid or its derivative, administered to the cells that 
- 1 0 express libraries of shuffled genes encoding acyl ligases and/or acyl transferases, e.g. , acyl- 

CoA ligases and acyl-CoA transferases. The amount of label in the hydrophobic reaction 
products is measured after extraction of the labeled derivatives into a suitable organic 
solvent, or after solid-phase extraction of these compounds by addition of a sufficient 
amount of hydrophobic porous resin beads {.e.g., XAD 1 180, XAD-2, -4, -8). In the case of 
1 5 a radiolabeled compound, scintillating dye can be present in the organic solvent, added to the 
I samples, or chemically incorporated in the bead polymer. The latter constitutes a 

modification of scintillation proximity assay method. 

(ii) Methods for detecting regioselectivity of transferase reactions. 
The methods for detecting regioselectivity of the transferase reactions include 
20 HPLC, and in an HTP modality, flow-through NMR spectroscopy. When NMR 

spectroscopy is used for determining relative amounts of different regiomeric monoacyl or 
monoglycosyl derivatives of oxidized substrates, the latter are preferably obtained by action 
of the arene monooxygenases on isotopically ("C and/or ^H) labeled substrate. Another 
variation of the NMR technique includes use of isotopically labeled precursors of acyl- or 
25 glycosyl- donor intermediates. 

5. Selecting for enhanced organic solvent resistance. 
f Selection for recombinant polynucleotides that provide irhproved organic 

solvent resistance can be accomplished by introducing the library of recombinant 
30 polynucleotides into a population of microorganism cells and subjecting the population to a 
medium that contains various concentrations of the organic hydrophobic compounds of 
interest. The medium can contain, for example, carbon, nitrogen and minerals, and 
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preferably does not otherwise limit growth and viability of the cells in the absence of the 
solvent, thus ensuring that solvent resistance is essentially the only limiting factor affecting 
growth of the cells expressing variants of the genes encoding solvent resistance traits. 

In other embodiments, one can employ a screening strategy to identify those 
5 recombinant polynucleotides that encode polypeptides that confer improved solvent 

resistance. For example, one can screen based on the in vivo expression of a reporter gene, 
i such as those encoding fluorescent proteins (exemplified by the green fluorescent protein, 

GFP). Preferably, for the purpose of detecting the best solvent resistant genes under 
essentially stationary growth phase conditions, those reporter genes are used which display 
1 0 their function in a fashion dependent on availability of intracellular reducing pools, such as 
jsjADH and NADPH, and essentially unimpaired ribosomal biosynthesis of proteins. 

Such genes and can be exemplified by several bacterial luciferase gene 
clusters {lux) which contain not only luciferase components, but also all polypeptides 
required for in-vivo regeneration of the aldehyde substrate for luciferase. 
I J 5 A variety of methods can be used to detect and to pick or to enrich for the 

clones with the most efficient solvent resistant traits as judged by display of the properties 
associated with the in-vivo reporter genes. These methods include, for example, 
fluorescence activating cell sorting of liquid cell suspensions {e.g., cells that express GFP) 
and CCD camera imaging of individual colonies grown on a solid(ified) medium {e.g., for 

20 cells that express lux). 

If additional improvement in solvent resistance is desired, one can carry out a 
series of cycles of iterative DNA shuffling and selection by growing the cells in the presence 
of the organic solvent. Concentrations of the solvents used for selective growth conditions 
are incrementally increased after each round of recursive mode DNA shuffling in order to 

25 provide more stringent selective pressu-'^ for those organisms expressing solvent resistance 
genes. 

For use in a high throughput screening protocol, the increase in the solvent 
^ resistance to a particular compound of interest and relevance to the biocatalytic synthesis of 

interest can also be directly measured by administering a radioactively labeled compound 
30 and determining relative distribution of radioactivity between cell biomass and extracellular 
medium components, similar to the method described by Ramos et ai, J. Bacteriol 
180:3323-3379 (1998). 
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F. Bioreactors 

In another aspect, the invention provides a bioreactor system for carrying out 
biotransformations using the improved polypeptides of the invention. The bioreactor 
includes: (a) an improved monooxygenase polypeptide of the invention; (b) a redox partner 
5 source; (c) oxygen; and (d) a substrate for oxidation. 

In a preferred embodiment, the monooxygenase polypeptide is an arene 
monooxygenase polypeptide. 

In another preferred embodiment, the bioreactor further includes another 
useful polypeptide, such as a transferase, ligase, dehydrogenase and the like. The additional 
10 useful polypeptide(s) can be co-expressed by a host cell also expressing the improved 
monooxygenase or it can be expressed by a host cell that does not express the improved 
monooxygenase. Moreover, each of the polypeptides incorporated into the reactor can be 
provided as a constituent of a whole cell preparation, a polypeptide extract or as a 
substantially pure polypeptide. The cells and/or polypeptides can be in suspension, solution 
15 or they can be immobilized on an insoluble matrix, bead or other particle. Additional 
considerations are discussed below. This discussion is intended as illustrative and not 
limiting. Other bioreactor formats, conditions, etc. will be apparent to those of skill in the 
art. 

General growth conditions for culturing the particular organisms are obtained 
20 from depositories and from texts known in the art such as Bergey'S MANUAL OF 
Systematic Bacteriology, Vol.1, N. R. Krieg, ed„ Williams and Wilkins, 
Baltimore/London (1984). 

For clarity of illustration, the discussion below focuses on the preferred 
conditions for the oxidation of an organic substrate using the polypeptides of the invention. 
25 It is understood that this focus is for the purpose of illustration and that similar conditions 
are applicable to palhways of the invention other than oxidation. 

The nutrient medium for the growth of any oxidizing microorganism should 
contain sources of assimilable carbon and nitrogen, as well as mineral salts. Suitable sources 
of assimilable carbon and nitrogen include, but are not limited to, complex mixtures, such as 
30 those constituted by biological products of diverse origin, for example soy bean flour, cotton 
seed flour, lentil flour, pea flour, soluble and insoluble vegetable proteins, com steep liquor, 
yeast extract, peptones and meat extracts. Additional sources of nitrogen are ammonium 
salts and nitrates, such as ammonium chloride, ammonium sulfate, sodium nitrate and 
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potassium nitrate. Generally, the nutrient medium should include, but is not limited to, the 
following ions: Mg^^ Na\ K^ Ca^^, NH/, Cr, SO*^', P04'- and NOj- and also ions of the 
trace elements such as Cu, Fe, Mn, Mo, Zn, Co and Ni. The preferred source of these ions 
are mineral salts. 

If these salts and trace elements are not present in sufficient amounts in the 
complex constituents of the nutrient medium or in the water used it is appropriate to 
supplement the nutrient medium accordingly. 

The microorganism employed in the process of the invention can be in the 
form of fermentation broths, whole washed cells, concentrated cell suspensions, polypeptide 
extracts, and immobilized polypeptides and/or cells. Preferably concentrated cell 
suspensions, polypeptide extracts, and whole washed cells are used with the process of the 
invention (S. A. White and G. W. Glaus, J. Bacteriology 150:934-943 (1982)). 
Methods of immobilizing polypeptides and cells are well known in the art and include such 
techniques as microencapsulation, attachment to alginate beads, cross-linked polyurethane, 
starch particles, polyacrylamide gels and the use of coacervates, which are aggregates of 
colloidal droplets. In a presently preferred embodiment, the polypeptide and/or cell is 
immobilized onto a glass particles having a porous outer surface, such as that described in 
Dubin , etai, U.S. Patent No. 5,922,531, issued July 13, 1999. 

Concentrated washed cell suspensions may be prepared as follows: the 
microorganisms are cultured in a suitable nutrient solution, harvested (for example by 
centrifuging) and suspended in a smaller volume (in salt or buffer solutions, such as 
physiological sodium chloride solution or aqueous solutions of potassium phosphate, sodium 
acetate, sodium maleate, magnesium sulfate, or simply in tap water, distilled water or 
nutrient solutions). The substrate is then added to a cell suspension of this type and the 
oxidation reaction according to the invention is carried out under the conditions described. 

The conditions for oxidizing a substrate in growing microorganism cultures 
or fractionated cell extracts are advantageous for carrying out the process according to the 
invention with concentrated cell suspensions. In particular the temperature range is from 
about 0 °C. to about 45 °C. and the pH range is from about 2 to about 10. There are no 
special nutrients necessary in the process of the invention. More importantly, washed or 
immobilized cells can simply be added to a solution of substrate, without any nutrient 
medium present. 
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It is also possible to carry out the process according to the invention with 
polypeptide extracts or polypeptide extract fractions prepared from cells. The extracts can 
be crude extracts, such as obtained by conventional digestion of microorganism cells. 
Methods to break up cells include, but are not limited to, mechanical disruption, physical 
5 disruption, chemical disruption, and enzymatic disruption. Such means to break up cells 
include ultrasonic treatments, passages through French pressure cells, grindings with quartz 
sand, autolysis, heating, osmotic shock, alkali treatment, detergents, or repeated freezing and 
drawing. 

If the process according to the invention is to be carried out with partially 

1 0 purified polypeptide extract preparations, the methods of protein chemistry, such as 
ultracentrifuging, precipitation reactions, ion exchange chromatography or adsorption 
chromatography, gel filtration or electrophoretic methods, can be employed to obtain such 
preparations. In order to carry out the reaction according to the invention with fractionated 
cell extracts, it may be necessary to add to the assay system additional reactants such as, 

1 5 physiological or synthetic electron acceptors, like NAD"", N ADP"", methylene blue, 

dichlorophenolindophenol, tetrazolium salts and the like. When these reactants are used, 
they can be employed either in equimolar amounts (concentrations which correspond to that 
of the substrate employed) or in catalytic amounts (concentrations which are markedly below 
the chosen concentration of substrate). If, when using catalytic amounts, it is to be ensured 

20 that the process according to the invention is carried out approximately quantitatively, a 
system which continuously regenerates the reactant which is present only in a catalytic 
amount must also be added to the reaction mixture. This system can be, for example, a 
polypeptide which ensures reoxidation (in the presence of oxygen or other oxidizing agents) 
of an electron acceptor which is reduced in the course of the reaction according to the 

25 invention. 

If nutnent media is used with intact microorganisms in a growing culture, 
nutrient media can be solid, semi-solid or liquid. Aqueous-liquid nutrient media are 
preferably employed when media is used. Suitable media and suitable conditions for 
cultivation include known media and known conditions to which substrate can be added. 
30 The substrate to be oxidized in the process of the invention can be added to 

the base nutrient medium either on its own or as a mixture with one or more oxidizable 
compounds. Additional oxidizable compounds which can be used include polyols, such as 
sorbitol or glycerol. 
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If one or more oxidizable compounds are added to the nutrient solution, the 
substrate to be oxidized can be added either prior to inoculation or at any desired subsequent 
time (between the early log phase and the late stationary growth phase). In such a case the 
oxidizing organism is preferably pre-cuitured with the oxidizable compounds. The 

'i^ 5 inoculation of the nutrient media is effected by a variety of methods including slanted tube 

A cultures and flask cultures. 

s 

^ Contamination of the reaction solution should be avoided. To avoid 

contamination, sterilization of the nutrient media, sterilization of the reaction vessels and 
sterilization of the air required for aeration is preferably undertaken. It is possible to use, for 
1 0 example, steam sterilization or dry sterilization for sterilization of the reaction vessels. The 
air and the nutrient media can likewise be sterilized by steam or by filtration. Heat 
sterilization of the reaction solution containing the substrate is also possible. 

The process of the invention can be carried out under aerobic conditions 
using shake flasks or aerated and agitated tanks. Preferably, the process is carried out by the 
i 1 5 aerobic submersion procedure in tanks, for example in conventional fermentors. It is 

^ possible to carry out the process continuously or with batch or fed batch modes, preferably 

the batch mode. 

It is advantageous to ensure that the microorganisms are adequately brought 
into contact with oxygen and the substrate. This can be effected by several methods 
20 including shaking, stirring and aerating. 

If ibam occurs in an undesired amount during the process, chemical foam 
control agents, such as liquid fats and oils, oil-in-water emulsions, parafFms, higher alcohols 
(such as octadecanol), silicone oils, polyoxyethylene compounds and polyoxypropylene 
compounds, can be added. Foam can also be suppressed or eliminated with the aid of 
25 mechanical devices. 

G. Kits 

1 Also provided is a kit or system utilizing any one of the selection strategies, 

materials, components, methods or substrates hereinbefore described. Kits will optionally 
30 additionally include instructions for performing methods or assays, packaging materials, one 
or more containers which contain assay, device or system components, or the like. 

In an additional aspect, the present invention provides kits embodying the 
methods and apparatus herein. Kits of the invention optionally include one or more of the 
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following: (1) a shuffled component as described herein; (2) instructions for practicing the 
methods described herein, and/or for operating the selection procedure herein; (3) one or 
more monooxygenase assay component; (4) a container for holding monooxygenase nucleic 
acids or polypeptides, other nucleic acids, transgenic plants, animals, cells, or the like and, 

(5) packaging materials. 

In another preferred embodiment, the kit provides a library of improved P- 
450s, that have been produced by shuffling for improved stability, ease of handling, etc. The 
polypeptides in this library have catalytic activities that are substantially identical to those P- 
450 found in microsome preparations used to screen drugs and other xenobiotic compounds. 

In a further embodiment, the present invention provides for the use of any 
component or kit herein, for the practice of any method or assay herein, and/or for the use of 
any apparatus or kit to practice any assay or method herein. 

In yet another embodiment, the kit of the invention includes one or more 
improved monooxygenase polypeptides of the invention. In a preferred embodiment, the kit 
includes a library of improved monooxygenase polypeptides. 

It is understood that the examples and embodiments described herein are for 
illustrative purposes only and that various modifications or changes in light thereof will be 
suggested to persons skilled in the art and are to included within the spirit and purview of 
this application and are considered within the scope of the appended claims. All 
publications, patents, and patent applications cited herein are hereby incorporated by 
reference in their f^atirety for all purposes. 
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WT^ AT TS CLAIMED IS : 

1 LA method for obtaining a polynucleotide that encodes an improved 

2 polypeptide comprising monooxygenase activity, wherein said improved polypeptide has at 

3 least one property improved over a naturally occurring monooxygenase polypeptide, said 

4 method comprising: 

5 (a) creating a library of recombinant polynucleotides encoding 

6 a recombinant monooxygenase polypeptide; and 

7 (b) screening said library to identify a recombinant 

g polynucleotide that encodes an improved recombinant monooxygenase 

9 polypeptide that has at least one property improved over said naturally 

1 0 occurring polypeptide . 

1 2. The method according to claim I, wherein said creating a library 

2 comprises: 

3 shuffling a plurality of parental polynucleotides to produce one or 

4 more recombinant monooxygenase polynucleotide encoding said improved property. 

1 3. The method according to claim 1, wherein said monooxygenase 

2 activity is a member selected from alkene epoxidation, alkane hydroxylation, aromatic 

3 hydroxylation, N-dealkylation of alkylamines, S-dealkylation of reduced thio-organics, O- 

4 dealkylation of alkyl ethers, oxidation of ary loxy phenols, conversion of aldehydes to acids, 

5 dehydrogenation, decarbonylation, oxidative dehalogenation of haloaromatics and 

6 halohydrocarbons, Baeyer-ViUiger monoxy genation, modification of cyclosporins, 

7 hydroxylation of mevastatin, oxygenation of sulfonylureas and combinations thereof. 

1 4. The method of claim 2, wherein at least one of said parental 

2 polynucleotides encode at least one monooxygenase activity. 

1 5. The method of claim 2, wherein said parental polynucleotides are 

2 homologous. 

1 6. The method of claim 2, wherein at least one of said parental 

2 polynucleotides does not encode a monooxygenase activity. 
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7. The method of claim 2, wherein said parental monooxygenase 
polynucleotide encodes a polypeptide or polypeptide subsequence selected from a P450 
oxygenase, a heme-dependent peroxidase, an iron sulfur monooxygenase, a quinone- 
dependent monooxygenase and combinacions thereof 

8. The method of claim 2, wherein a member selected from said parental 
polynucleotides, said one or more recombinant monooxygenase polynucleotide, said 
identified recombinant monooxygenase polynucleotide ana combinations thereof is cloned 
into an expression vector. 

9. The method of claim 1, wherein said identified recombinant 
monooxygenase polynucleotide has an ability to catalyze an enzymatic reaction using a 
redox partner other than NADPH. 

10. The method of claim 2, further comprising: 
creating a library of recombinant peroxide production activity 

polynucleotides encoding a recombinant hydrogen peroxide production activity; 

screening said library to identify a recombinant polynucleotide that encodes 
an improved hydrogen peroxide production activity; and 

co-expressing one or more of said identified hydrogen peroxide production 
activity polynucleotides and said identified recombinant monooxygenase polynucleotide in a 
cell. 

11. The method of claim 2, further comprising: 

2 creating a library of recombinant epoxide hydrolase activity polynucleotides 

3 encoding a recombinant epoxide hydrolase activity; 

4 screening said library to identify a recombinant polynucleotide ^'^at encodes 

an improved epoxide hydrolase activity; and 

co-expressing one or more of said identified recombinant epoxide hydrolase 
activity polynucleotides and said identified recombinant monooxygenase polynucleotide in a 



8 cell. 

1 12. The method of claim 2, further comprising: 

2 creating a library of recombinant dehydrogenase activity polynucleotides 

3 encoding a recombinant dehydrogenase activity; 
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4 screening said library to identify a recombinant polynucleotide that encodes 



an improved dehydrogenase activity; and 



6 co-expressing one or more of said identified recombinant dehydrogenase 



7 



activity polynucleotides and said identified recombinant monooxygenase polynucleotide in a 



8 cell. 



1 13. The method of claim 1, further comprising: 



2 



creating a library of recombinant transferase activity polynucleotides 

3 encoding a recombinant transferase activity; 

4 screening said library to identify a recombinant polynucleotide that encodes 

5 an improved transferase activity; and 

6 co-expressing one or more of said identified recombinant transferase activity 

7 polynucleotides and said identified recombinant monooxygenase polynucleotide in a cell. 

1 14. The method according to claim 13, wherein said transferase 

I 2 polynucleotide is a member selected from acyltransferases, glycosyltransferases, methyl 

3 transferases and combinations thereof 

1 1 5. The method of claim 2, wherein said plurality of parental 

2 polynucleotides are shuffled to produce a library of recombinant polynucleotides comprising 

3 one or more library member polynucleotide encoding one or more monooxygenase activity, 

4 which library is selected for one or more monooxygenase activity selected from alkene 

5 epoxidation, alkane hydroxylation, aromatic hydroxylation, N-dealkylation of alkylamines, 

6 S-dealkylation of reduced thio-organics, 0-dealkylation of alky 1 ethers, oxidation of ary loxy 

7 phenols, conversion of aldehydes to acids, dehydrogenation, decarbonylation, oxidative 

8 dehalogenation of haloaromatics and halohydrocarbons, Baeyer-Villiger monoxy genation, 
^1 9 modification of cyclosporins, hydroxylation of mevastatin, conversion of cholesterol to 

1 0 pregnenolone, and oxygenation of sulfonylureas. 

^ 16. A library of recombinant polynucleotides comprising one or more 

2 monooxygenase activity made by said method of claim 1. 

1 17. The library of claim 16, wherein said library is a phage display 

2 library. 
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1 8. An improved monooxygenase encoding nucleic acid prepared by the 
2 method according to claim 1. 

19. The method of claim 2, wherein said parental polynucleotides are 
shuffled in a plurality of cells, which cells are prokaryotes or eukaryotes. 

20. The method of claim 2, wherein said parental polynucleotides are 
shuffled in a plurality of cells, which cells are yeast, bacteria, or fimgi. 

21. The method of claim 2, wherein said parental polynucleotides are 
shuffled in a plurality of cells; said method optionally further comprises one or more 

members selected from 

(a) recombining DNA from said plurality of cells that display 
monooxygenase activity with a library of DNA fragments, at least one of which undergoes 
recombination with a segment in a cellular DNA present in said cells to produce recombined 
cells, or recombining DNA between said plurality of cells that display monooxygenase 
activity to produce cells with modified monooxygenase activity; 

(b) recombining and screening said recombined or modified cells to produce 
further recombined cells that have evolved additionally modified monooxygenase activity; 

11 and 

(c) repeating (a) or (b) until said fiirther recombined cells have acquired a 
desired monooxygenase activity. 

22. The method of claim 2, wherein said method further comprises: 

(a) recombining at least one distinct or improved recombinant polynucleotide 
with a further monooxygenase activity polynucleotide, which further polynucleotide is 
identical to or different from one or more of said plurality of parental polynucleotides to 
produce a library of recombinant monooxygenase polynucleotides; 

(b) screening said library to identify at least one further distinct or improved 
recombinant monooxygenase polynucleotide tiiat exhibits a further improvement or distinct 
property compared to said plurality of parental polynucleotides; and, optionally, 

(c) repeating (a) and (b) until said resulting further distinct or improved 
recombinant polynucleotide shows an additionally distinct or improved monooxygenase 

1 1 property. 
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1 23. The method of claim 2, wherein said recombinant monooxygenase 

2 polynucleotide is present in one or more bacterial, yeast, or fungal cells and said method 

3 comprises: 

4 pooling multiple separate monooxygenase polynucleotides; 

5 screening said resulting pooled monooxygenase polynucleotides to 

6 identify an improved recombinant monooxygenase polynucleotides that exhibits an 

7 improved monooxygenase activity compared to a non-recombinant monooxygenase activity 

8 polynucleotide; and 

9 cloning said improved recombinant nucleic acid. 

1 24. The method of claim 23, further comprising transducing said distinct 

2 or improved nucleic acid into a prch^yote or eukaryote. 

1 25. The method of claim 2, wherein said shuffling of a plurality of 

2 parental polynucleotides comprises family gene shuffling. 

1 26. The method of claim 2, wherein said shuffling of a plurality of 

2 parental nucleic acids comprises individual gene shuffling. 

1 27. A selected shuffled monooxygenase nucleic acid made by said method 

2 of claim 2. 

1 28. A DNA shuffling mixture, comprising: at least three homologous 

2 DNAs, each of which is derived from a polynucleotide encoding a member selected from a 

3 polypeptide encoding monooxygenase activity, a polypeptide fragment encoding 

4 monooxygenase activity and combinations thereof 

1 29. The DNA shuffling mixture of claim 28, wherein said at least three 

2 homologous DNAs are present in cell culture or in vitro. 

1 30, A method for increasing monooxygenase activity in a cell, 

2 comprising: performing whole genome shuffling of a plurality of genomic polynucleotides in 

3 said cell and selecting for one or more monooxygenase activity. 

1 31. The method of claim 30, wherein said genomic nucleic ^cids are from 

2 a species or strain different from said cell. 
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1 32, The method of claim 30, wherein said cell is of prokaryotic or 

2 eukaryotic origin. 



33. The method of claim 30, wherein said monooxygenase activity to be 
2 selected is aikene epoxidation, alkane hydroxylation, aromatic hydroxylation, N-dealkylation 



of alkylamines, S-dealkylation of reduced thio-organics, 0-Dealkylation of alkyl ethers, 



4 oxidation of aryloxy phenols, conversion of aldehydes to acids, dehydrogenation. 



decarbonylation, oxidative dehalogenation of haloaromatics and halohydrocarbons, Baeyer- 



6 Villiger monoxygenation, modification of cyclosporins, hydroxylation of mevastatin, 

7 conversion of cholesterol to pregnenolone, oxygenation of sulfonylureas and combinations 

8 thereof- 

1 34. A method for obtaining a polynucleotide encoding an improved 

2 polypeptide acting on a substrate comprising a target group selected from an olefin, a 

3 terminal methyl group, a methylene group, an aryl group and combinations thereof, wherein 

4 said improved polypeptide exhibits one or more improved properties compared to a naturally 

5 occurring polypeptide acting on said substrate, said method comprising: 

6 creating a library of recombinant polynucleotides that encoding a 

7 monooxygenase polypeptide acting on said substrate; and 

8 screening said library to identify a recombinant polynucleotide 

9 encoding an improved polypeptide that exhibits one or more improved properties compared 
10 to a naturally occurring monooxygenase polypeptide. 



35. The method according to claim 34, wherein said library of recombinant 

2 polynucleotides is created by recombining at least a first form and a second form of a nucleic 

3 acid, at least one form encoding said naturally occurring polypeptide or a fragment thereof, 

4 wherein said first form and said second form differ from each other in two or more 

5 nucleotides. 

1 36. The method according to claim 35, wherein said first and second forms 

2 of said nucleic acid are homologous. 
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1 37. The method according to claim 35, wherein at least one of said first 

2 and second forms of said nucleic acid does not encode a polypeptide having monooxygenase 

3 activity. 



1 



38. A polypeptide encoded by a polynucleotide according to claim 34. 



4 1 39. The polypeptide according to claim 38 wherein said polypeptide has an 

^ 2 activity comprising, converting an olefin to an epoxide. 

1 40. The polypeptide accoramg to claim 38, wherein said polypeptide has an 

2 activity comprising, converting said temiinal methyl group to a hydroxymethyl group. 

1 41, The polypeptide according to claim 38, wherein said polypeptide has an 

2 activity comprising, converting a methylene group to a hydroxmethylene group, 

|| I 42. The polypeptide according to claim 38, wherein said polypeptide has an 

2 activity comprising, converting an aryl group to a hydroxyaryl group. 



1 



43, The polypeptide according to claim 38, wherein said improved property 



2 is selected firom: 
3 



improved regiospecificity of said acting on a substrate, wherein said 



4 substrate comprises at least two target groups; 



5 



enhanced production of a desired enantiomeric form of a reaction 

6 product; 

7 enhanced expression of said polypeptide by a host cell that comprises 

8 said recombinant polynucleotide; and 

9 enhanced stability of said polypeptide in said presence of an organic 
10 solvent. 

1 44. A method of oxidizing a substrate comprising a target group selected 

2 from an olefin, a terminal methyl group, a methylene group, an aryl group and combinations 

3 thereof, said method comprising contacting said substrate with a polypeptide according to 

4 claim 38 
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1 45. The method according to claim 44, wherein said absolute configuration 

2 of a product of said monooxygenase is R, S, or a mixture thereof 

J 46. A method for preparing an epoxide group, said method comprising 

^'5 2 contacting a substrate comprising a carbon-carbon double bond with a polypeptide according 

to claim 39. 



i 



I J 47. A method for preparing a hydroxymethyl group, said method 

2 comprising contacting a substrate comprising a terminal methyl group with a polypeptide 

3 according to claim 40. 



1 48. A method for preparing a hydroxymethylene group, said method 

2 comprising contacting a substrate comprising a methylene group with a polypeptide 

3 according to claim 41. 

1 49. A method for preparing a hydroxyaryl group, said method comprising 

2 contacting a substrate comprising an aryl group with a polypeptide according to claim 42. 



1 50. An organism comprising a recombinant monooxygenase polynucleotide 

2 encoding an improved polypeptide that catalyzes a reaction selected from epoxidation of an 

3 olefm, hydroxylation of a terminal methyl group, hydroxylation of a methylene group, 

4 hydroxylation of an aryl group and combinations thereof wherein said polypeptide exhibits 

5 one property improved relative to a corresponding property of a naturally occurring 

6 monooxygenase polypeptide. 



I 

2 



51. The organism according to claim 50, further comprising an improved 
transferase polypeptide that exhibits one or more improved properties improved relative to a 
^ 3 corresponding property of a naturally occurring transferase polypeptide. 

1 52. The organism according to claim 51, wherein said transferase is 

2 selected from S-adenosylmethionine dependent 0-methyltransferase, acyl-CoA transferase 

3 and combinations thereof. 
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1 53. The organism according to claim 50, further comprising an improved 

2 ligase peptide that exhibits one or more properties improved relative to a corresponding 

3 property of a naturally occurring ligase polypeptide. 

1 54. The organism according to claim 53, wherein said ligase is an acyl 

2 CoA ligase. 

1 55. The organism according to claim 50, further comprising an improved 

2 racemase polypeptide that exhibits one or more properties improved relative to a 

3 corresponding property of a naturally occurring racemase polypeptide. 

1 56. The organism according to claim 55, wherein said racemase is 

2 mandelate racemase. 

1 57. The organism according to claim 50, further comprising a 

2 dehydrogenase polypeptide that exhibits one or more properties improved relative to a 

3 corresponding property in a naturally occurring dehydrogenase polypeptide. 

1 58. The organism according to claim 57, said organism dehydrogenating a 

2 hydroxyaikyl group of a substrate having the structure: 



{CH(R^3)(CH2)sR'^}, 

{CH(R'')(CH2)nR^\ 



4 wherein 

5 \ R^^ and R*"* are independently selected from H and OH and at least 

6 6ne of R*\ R^', R^^ andR^' is OH; 

7 n and s are independently selected from the numbers 0 to 16; and 

8 p and t are independently selected from 0 to 6, wherein at least one of p and t 

9 must be at least one and p + 1 < 6, 

10 said hydroxyaikyl group being dehydrogenated to a member selected from a 

1 1 carboxylic acid, a ketone carbonyl and an aldehyde carbonyl. 

1 59. The organism according to claim 50, further comprising an improved 

2 solvent resistance polypeptide that confers upon said organism a resistance to an organic 
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3 solvent that is improved relative to that conferred by a naturally occurring solvent resistance- 

4 conferring polypeptide. 



1 



60. The organism according to claim 59, wherein said improved solvent 

2 resistance polypeptide imparts to the organism a resistance to one or more organic 

3 compounds selected from olefins, a-hydroxycarboxylic acids, diols, aldehydes, ketones, 
I 4 halogenated hydrocarbons, perfluorocarbons, esters, aryl compounds, carboxylic acids, 

5 alcohols, ethers and combinations thereof. 

1 61. The organism of claim 59, wherein said improved solvent resistance 

2 polypeptide imparts to the organism a resistance to said solvent, wherein the solvent is 

3 present in a medium at hypersaturating concentrations. 

1 62. The organism according to claim 50, wherein said organism further 

2 comprises an epoxide hydrolase polypeptide that exhibits one or more properties improved 

3 relative to a corresponding property of a naturally occurring epoxide hydrolase polypeptide. 

1 63. The organism according to claim 50, wherein said microorganism 

2 further comprises an epoxide isomerase polypeptide that exhibits one or more properties 

3 improved relative to a corresponding property of a naturally occurring epoxide isomerase 

4 polypeptide. 



1 



64. The organism of claim 50, wherein said organism further comprises two 

2 or more recombinant polynucleotides selected from the group consisting of 

3 an improved transferase polypeptide that exhibits one or more 

4 properties improved relative to a corresponding property of a naturally occurring transferase 

5 polypeptide; 

6 an improved epoxide hydrolase peptide that exhibits one or more 

7 properties improved relative to a corresponding property of a naturally occurring epoxide 

8 hydrolase polypeptide; 

9 an improved ligase peptide that exhibits one or more properties 

10 improved relative to a corresponding property of a naturally occurring ligase polypeptide; 
J 1 an improved racemase polypeptide that exhibits one or more properties 

12 improved relative to a corresponding property of a naturally occurring racemase polypeptide; 
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12 an improved dehydrogenase polypeptide that exhibits one or more 

14 properties improved relative to a corresponding property of a naturally occurring 

1 5 dehydrogenase polypeptide; 

J 15 an improved epoxide isomerase polypeptide that exhibits one or more 

17 properties improved relative to a corresponding property of a naturally occurring epoxide 

1 8 isomerase polypeptide; and 

^ ig an improved solvent resistance polypeptide that confers upon said 

20 organism a resistance to an organic solvent that is improved relative to that conferred by a 

21 naturally occurring solvent resistance-conferring polypeptide. 

1 65. A method for preparing an epoxide group, said method comprising 

2 contacting a substrate comprising a carbon-carbon double bond with an organism according 

3 to claim 50, thereby forming said epoxide group. 

1 66. The method according to claim 65, wherein said substrate is selected 

I 2 from styrene, styrene substituted on the phenyl group, divinylbenzene, divinylbenzene 

1^ 3 substituted on the phenyl group, isoprene, butadiene, diallyl ether, allyl phenyl ether, allyl 

4 phenyl ether substituted on the phenyl group, allyl alkyl ether, allyl aralkyl ether, 

5 vinylcyclohexene, vinylnorbomene, and acrolein. 

1 67. A method for converting an olefin into a vicinal diol, said method 

2 comprising: 

3 (a) contacting said olefin with an organism according to claim 50 to form an 

4 epoxide; and 

5 (b) contacting said epoxide with an organism comprising an epoxide 

6 hydrolase polypeptide, thereby forming said vicinal diol. 

1 68. The method according to claim 67, wherein said epoxide hydrolase 

2 polypeptide exhibits one or more properties improved relative to corresponding properties of 

3 a naturally occurring epoxide hydrolase polypeptide. 

1 69. The method according to claim 67, wherein said polypeptide of (a) 

2 and said polypeptide of (b) are expressed in the same host cell. 

1 70. The method according to claim 67, further comprising, 
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2 (c) contacting said vicinal diol with an organism comprising a polypeptide 

3 selected from a ligase polypeptide and a transferase polypeptide, thereby forming a vicinal 

4 diol adduct. 

71. The method according to claim 70, wherein said polypeptide of (c) is 
a polypeptide exhibiting one or more properties improved over a conesponding property of 
an analogous naturally occurring polypeptide. 



1 
7 



1 
2 



72. The method according to claim 70, wherein said polypeptide of (a), 
said polypeptide of (b) and said polypeptide of (c) are expressed in the same host cell. 

1 73. The method according to claim 67, wherein said vicinal diol has the 

2 structure: 

OH 




HO^ ^ >R 



4 wherein 

5 R' is selected from aryl, substituted aryl, heteroaryl, substimted heteroaryl, 

6 heterocyclyl, substituted heterocyclyl, — NR^R\ — OR^ — CN, 

7 C(R'')NR^R^ and C(R^)OR^ groups, 

8 r2 and R^ are members independently selected from H, alkyl, substituted 

9 alkyl, aryl, substituted aryl, heteroaryl, substituted heteroaryl, 
J Q heterocyclyl and substituted heterocyclyl groups; 

11 R* is selected from =0 and =S, and 

12 n is a number between 0 and 1 0, inclusive. 

1 74. The method according to claim 73, wherein 

2 R' is ^-Jlected from phenyl, substituted phenyl, pyridyl, substituted pyridyl 

3 _NR'R\ —0K\ — CN, C(R'')NR^R' and C(R')0R^ groups, 

4 R' and R^ are members independently selected from H, alkyl, substituted 

5 alkyl, aryl, substituted aryl, heteroaryl, substituted heteroaryl, 

6 heterocyclyl and substituted heterocyclyl groups; and 

7 R"* is selected from =0 and =S. 
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1 75- A method for converting an olefin into an a-hydroxycarboxylic acid, 

2 said method comprising: 

3 (a) contacting said olefin with an organism according to claim 50 to form an 

4 epoxide; 

5 (b) contacting said epoxide with an organism comprising an epoxide 

6 hydrolase polypeptide to form a vicinal diol; and 

7 (c) contacting said vicinal diol with an organism comprising a dehydrogenase 

8 polypeptide to form said a-hydroxycarboxylic acid. 

1 76. The method according to claim 75, wherein at least one of said 

2 hydrolase polypeptide and said dehydrogenase polypeptide exhibits at least one property 

3 improved relative to a corresponding property in an analogous naturally occurring 

4 polypeptide. 

\ 77. The method according to claim 78, wherein said polypeptide of (a), of 

2 (b) and of (c) are expressed in the same host cell. 

1 78. A method for converting an olefin into an a-hydroxycarboxylic acid, 

2 said method comprising, contacting said olefin with an organism according to claim 64, 

3 wherein said two or more recombinant polynucleotides are an improved epoxide hydrolase 

4 and an improved dehydrogenase. 

1 79. The method according to claim 78, further comprising: 

2 (d) contacting said a-hydroxycarboxylic acid with an organism comprising an 

3 improved polypeptide having an activity selected from ligase, transferase and combinations 

4 thereof, thereby forming a a-hydroxycarboxylic acid adduct. 

1 80. The method according to claim 79, wherein at least two of said 

2 polypeptide of (a), (b), (c), (d) are expressed in the same host cell. 

1 81. The method according to claim 79, wherein at least one of said 

2 polypeptide selected from ligase, transferase and combinations thereof is an improved 

3 polypeptide. 
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82. The method according to claim 78, wherein said a-hydroxycarboxyhc 



is selected from aryl, substituted aryl, heteroaryl, substituted heteroaryl, 
heterocyclyl, substituted heterocyclyl, — NR^R\ — 0R^ — CN, 

C(R^)NR^R^ and C(R'*)ORVoups, 
R^ and R^ are members independently selected from H, alkyl, substituted 
alkyl, aryl, substituted aryl, heteroaryl, substituted heteroaryl, 
heterocyclyl and substituted heterocyclyl groups; 
R"^ is selected from =0 and =S, and 
n is a number between 0 and 10, inclusive. 

83. The method according to claim 82 wherein 

R' is selected from phenyl, substituted phenyl, pyridyl, substituted pyridyl 
— NR'r\ --or', — CN, C(R')NR'r' and C(R')0R' groups, 

R^ and R^ are members independently selected from H, alkyl, substituted 
alkyl, aryl, substituted aryl, heteroaryl, substituted heteroaryl, 
heterocyclyl and substituted heterocyclyl groups; and 

R'^ is selected from =0 and =S. 

84. The method according to claim 79, wherein said transferase activity is 



selected from glycosyl transferase activity and methyltransferase activity. 



85. The method according to claim 84, wherein said methyl transferase is 



acid has the structure: 



OH 




wherein 



a 



S-adenosylmethionine dependent 0-methyltransferase. 
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is selected from aryl, substituted aryl, heteroaryl, substituted heteroaryl, 
heterocyclyl, substituted heterocyciyl, — NR^R\R^)nt, — OR^ — <:N, 
C(R^)NR^R^ and C(R^)OR^ groups, 

R', R^ and R"* are members independently selected from said group consisting 
of H, alkyl, substituted alkyl, aryl, substituted aryl, heteroaryl, substituted 
heteroaryl, heterocyclyl and substituted heterocyclyl groups; 

R^ is selected from =0 and =S; 

R^ is selected from H, alkyl and substituted alkyl groups; 

R' is C(0)R^ wherein R^ is selected from H alkyl and substituted alkyl 
groups and R^ and R^ are not both H; 

m is 0 or 1, such that when m is 1, an anunonium salt is provided; and 

n is a number between 0 and 10, inclusive. 

87. The method according to claim 86 wherein 

R* is selected from phenyl, substituted phenyl, pyridyl, substituted pyridyl 
— NR^R\ -OR^ — CN, C(R^)NR^R^ and C(R^)OR^ groups 

R^ and R^ are members independently selected from said group consisting of 
H, CrC6 alkyl and allyl; and 

R^ is =0. 

88. A method for preparing a hydroxy group, said method comprising: 
(a) contacting a substrate comprising a terminal methyl group with a 



microorganism according to claim 50, thereby forming a hydroxymethyl group. 

89. The method according to claim 88, wherein said substrate comprises 
an alkyl-terminal methyl group as a component of a substrate selected from arylalkyl groups, 
substituted arylalkyl groups, heteroarylalkyl groups, and substituted heteroarylalkyl groups. 

90. The method according to claim 88, wherein said substrate has the 

structure 



3 
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4 wherein, 

5 each of said n R groups is a member selected from the group consisting of H, 

6 alkyl groups and substituted alkyl groups; 

7 m is a number from 0 to 10, inclusive; and 
g n is a number from 0 to 5, inclusive. 

1 91. The method according to claim 90, wherein said substrate comprises 
I 2 benzene substituted with a member selected from the group of straight-chain alkyl groups 

3 branched-chain alkyl groups and combinations thereof. 

J 92. The method according to claim 91, wherein said substrate comprises 

2 benzene substituted with a member selected from CpCe straight-chain, C-Cfi branched- 

3 chain alkyl and combinations thereof. 

1 93, The method according to claim 92, wherein said alkyl group is 

2 selected from ethyl, n-propyl, z-propyl, ^butyl and combinations thereof. 

I 94^ Xhe method according to claim 92, wherein said substrate is 



3 wherein n is a number between 0 and 9, inclusive. 

95. The method according to claim 92, wherein said substrate has the 



1 

2 structure: 



(CH3)n 



O 

3 

4 wherein n is a number between 1 and 6, inclusive. 

1 96. The method according to claim 88, wherein said hydroxy group is a 

2 component of a member selected from benzyl alcohol, substituted benzyl alcohol, 2- 

3 phenylethanol, substituted 2-phenylethanol, 3-phenylpropanol and substituted 3- 

4 phenylpropanol. 
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1 97. The method according to claim 88, further comprising: 

2 (b) contacting said hydroxymethyl group with an organism comprising an 

3 acyltransferase, thereby forming an acylated hydroxy adduct. 

1 98. The method according to claim 97, wherein said acyltransferase 

2 exhibits one or more properties improved relative to a corresponding property of a naturally 

3 occurring acyltransferase. 

1 99. The method according to claim 97, wherein said polypeptide of (a) 

2 and said polypeptide of (b) are expressed by the same host cell. 

1 100. The method according to claim 88, further comprising: 

2 (b) contacting said hydroxymethyl group with a microorganism comprising 

3 an improved glycosyltransferase, thereby forming a glycosylated hydroxy adduct. 

1 101 The method according to claim 100, wherein said glycosyltransferase 

2 exhibits one or more properties improved relative to a corresponding property of a naturally 

3 occurring glycosyltransferase. 

1 102. The method according to claim 100, wherein said polypeptide of (a) 

2 and said polypeptide of (b) are expressed by the same host cell. 

1 103. The method according to claim 88, further comprising: 

2 (b) contacting said hydroxy group with a microorganism comprising a 

3 dehydrogenase, thereby forming a carboxylic acid. 

1 104. The method according to claim 103, wherein said dehydrogenase 

2 exhibits one or more properties improved relative to a corresponding property of a naturally 

3 occurring dehydrogenase. 

1 105, The method according to claim 103, wherein said polypeptide of (a) 

2 and said polypeptide of (b) are expressed by the same host cell. 

1 106. The method according to claim 1 10, further comprising, contacting 

2 said carboxylic acid with a microorganism comprising an improved transferase, thereby 

3 forming a carboxylic acid ester. 
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107. A method for preparing a hydroxymethylene group, said method 

2 comprising contacting a substrate comprising a methylene group with a microorganism 

3 according to claim 50. 

108. The method according to claim 107, wherein said substrate comprises 
a member selected from 3 ,4-dihydrocoumarin and 3,4-dihydrocoumarin residues. 



2 

i 

I J 109. The method according to claim 107, wherein said substrate is 3,4- 

2 dihydrocoumarin and said polypeptide converts said substrate to 4-hydroxy-,4- 

^ 3 dihydrocoumarin. 



1 



110. A method for preparing a hydroxyaryl group, said method comprising: 

2 (a) contacting a substrate comprising an aryl group with a microorganism 

3 according to claim 50. 

1 111. The method according to claim 110. wherein said substrate comprises 

I 2 a group selected from aryl groups, substituted aryl groups, heteroaryl groups and substituted 

3 heteroaryl groups. 



1 



112. The method according to claim 110. further comprising: 



2 (b) contacting said hydroxyaryl group with an organism comprising an 



3 



acyltransferase, thereby forming an acylated hydroxyaryl adduct. 

1 113. The method according to claim 112. wherein said acyltransferase 

2 exhibits one or more properties improved relative to a corresponding property of a naturally 

3 occurring acyltransferase. 

1 114. The method according to claim 1 12, wherein said polypeptide of (a) 

2 and said polypeptide of (b) are expressed by the same host cell. 

1 15. The method according to claim 1 12, further comprising: 
(b) contacting said hydroxyaryl group with a microorganism comprising a 
glycosyltransferase. thereby forming a glycosylated hydroxyaryl adduct. 
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1 116 The method according to claim 115, wherein said glycosyltransferase 

2 exhibits one or more properties improved relative to a corresponding property of a naturally 

3 occurring glycosyltransferase. 

1 117. The method according to claim 115, wherein said polypeptide of (a) 

2 and said polypeptide of (b) are expressed by the same host cell. 

1 118. A screening process comprising: 

2 (a) introducing the library of recombinant polynucleotides into a 

3 population -f test microorganisms such that the recombinant polynucleotides are expressed; 

4 (b) placing the organisms in a medium comprising at least one substrate; 

5 and 

6 (c) and identifying those organisms exhibiting an improved property 

7 compared to microorganisms without the recombinant polynucleotide, 

1 119. A bioreactor comprising: 

2 (a) an improved monooxygenase polypeptide; 

3 (b) a redox partner; 

4 (c) oxygen; 

5 (d) an oxidizable substrate. 

1 120. The bioreactor according to claim 119, wherein said polypeptide is 

2 immobilized. 

1 121. The bioreactor according to claim 119, wherein said polypeptide is a 

2 chimeric polypeptide. 

1 122. Tbf* bioreactor according to claim 1 19, wherein said polypeptide is a 

2 P-450 polypeptide. 

1 123. The bioreactor according to claim 122, wherein said P-450 is a 

2 peroxide-stable P-450, 

1 124. A kit comprising: 

2 (a) at least one improved monooxygenase polypeptide; and 
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3 (b) directions for using said polypeptide to carry out a chemical 

4 reaction. 

1 125. The kit according to claim 124, wherein said at least one improved 

2 monooxygenase polypeptide is a constituent of a library of improved polypeptides. 



1 



126. A recombinant P450 polypeptide comprising a backbone domain and 

2 an active site domain, wherein at least one of said domains comprises at least two contiguous 

3 amino acids that are not contiguous in a naturally occurring P450 enzyme. 

1 127. The recombinant P450 polypeptide according to claim 126, wherein 

2 the junction between the active site domain and the backbone domain is at a location 

3 selected from an end of the I helix and within the G-H loop. 

1 128. The recombinant P450 polypeptide according to claim 126, wherein 

2 the F and G helices are transfened into the backbone P450. 



1 1 



129. A polynucleotide that encodes a recombinant P450 polypeptide 

2 according to claim 126. 

1 130. A method of obtaining a polynucleotide that encodes a recombinant 

2 P450 polypeptide comprising a backbone domain and an active site domain, said method 

3 comprising: 

4 (a) recombining at least first and second forms of a nucleic acid thu. encodes 

5 a P450 active site domain, wherein the first and second forms differ from each other in two 

6 or more nucleotides to produce a library of recombinant active site domain encoding 

7 polynucleotides; and 

8 (b) linking the recombinant active site domain-encoding polynucleotide to a 

9 backbone-encoding polynucleotide so that the active site-encoding domain and the 
1 0 backbone-encoding domain are in-frame. 

1 131. The method according to claim 130, wherein said backbone is derived 

2 from P450bmp- 
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1 132. The method according to claim 130, wherein said backbone domain 

2 and said recombinant active-site domain are joined at a member selected from an end of the I 

3 helix and within the G-H loop. 

1 133. The method according to claim 130, wherein the F and G helices are 

2 transferred into the backbone P450. 

^ I 134. A method of obtaining a polynucleotide that encodes a recombinant 

2 P450 polypeptide comprising a backbone domain and an active site domain, said method 

3 comprising: 

4 (a) recombining at least first and second forms of a nucleic acid that encodes 

5 a P450 backbone domain, wherein the first and second forms differ from each other in two 

6 or more nucleotides to produce a library of recombinant backbone domain encoding 

7 polynucleotides; and 

I g (b) linking the recombinant backbone domain-encoding polynucleotide to a 

^ 9 active site-encoding polynucleotide so that the backbone-encoding domain and the active 

10 site-encoding domain are in-frame. 

1 135- The method according to claim 134, wherein said backbone is derived 

2 from P450bmp. 

1 136. The method according to claim 134, wherein said backbone domain 

2 and said recombinant active-site domain are joined at a member selected from an end of the I 

3 helix and within the G-H loop. 

1 137. The method according to claim 134, wherein the F and G helices are 

2 transferred into the backbone P450. 

1 138. A method of obtaining a polynucleotide that encodes a recombinant 

2 P450 polypeptide comprising a backbone domain and an active site domain, said method 

3 comprising: 

4 (a) recombining at least first and second forms of a nucleic acid that encodes 

5 a P450 active site domain, wherein the first and second forms differ from each other in two 
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6 or more nucleotides to produce a library of recombinant active site domain encoding 

7 polynucleotides; 

8 (b) recombining at least first and second forms of a nucleic acid that encodes 

9 a P450 backbone domain, wherein the first and second forms differ from each other in two 

10 or more nucleotides to produce a library of recombinant backbone domain encoding 

1 1 polynucleotides; and 

J 2 (c) linking the recombinant active site domain-encoding polynucleotide to the 

1 3 recombinant backbone-encoding polynucleotide so that the recombinant active site-encoding 

14 domain and the recombinant backbone-encoding domain are in-frame. 

1 139. The method according to claim 138, wherein said backbone is derived 

2 from P450bmp. 

1 140. The method according to claim 138, wherein said backbone domain 

2 and said recombinant active-site domain are joined at a member selected from an end of the I 

3 helix and within the G-H loop. 

1 141. The method according to claim 138, wherein the F and G helices are 

2 transferred into the backbone P450. 
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Description 

Soybean oil accounts for about 70% of the 14 billion pounds of edible oil consumed in the United 
States and is a major edible oil worldwide. It is used in baking, frying, salad dressing, marganne. and a 
multitude of processed foods. In 1987/88 60 million acres of soybean were planted in the U.S. Soybean is 
the lowest-cost producer of vegetable oil, which is a by-product of soybean meal. Soybean is agronomically 
well-adapted to many parts of the U.S. Machinery and facilities for harvesting, storing, and crushing are 
widely available across the U.S. Soybean products are also a major element of foreign trade since 30 
million metric tons of soybeans, 25 million metric tons of soybean meal, and 1 billion pounds of soybean oil 
were exported In 1987/88. Nevertheless, increased foreign competition has lead to recent declines in 
soybean acreage and production. The low cost and ready availability of soybean oil provides an excellent 
opportunity to upgrade this commodity oil into higher value speciality oils to both add value to soybean 
crop for the U.S. farmer and enhance U.S. trade. 

Soybean oil derived from commercial varieties is composed primarily of 11% palmitic (16:0). 4% stearic 
J6 (18 0) 24% oleic (18:1). 54% linoleic (18:2) and 7% linolenic (18:3) acids. Palmitic and steanc acids are. 
respectively 16- and 1 8-carbon-long saturated fatty acids. Oleic, linoleic and linolenic are 18-carbon-long 
unsaturated fatty acids containing one, two and three double bonds, respectively. Oleic acid is also referred 
to as a monounsaturated fatty acid, while linoleic and linolenic acids are also referred to as polyunsaturated 
fatty acids. The specific performance and health attributes of edible oils is determined largely by their fatty 
20 acid composition. 

Soybean oil is high in saturated fatty acids when compared to other sources of vegetable oil and 
contains a low proportion of oleic acid, relative to the total fatty acid content of the soybean seed. These 
characteristics do not meet important health needs as defined by the American Heart Association. 

More recent research efforts have examined the role that monounsaturated fatty acid plays in reducing 

25 the risk of coronary heart disease. In the past, it was believed that monounsaturates, in contrast to saturates 
and polyunsaturates, had no effect on serum cholesterol and coronary heart disease risk. Several recent 
human clinical studies suggest that diets high in monounsaturated fat may reduce the "bad" (low-density 
lipoprotein) cholesterol while maintaining the "good" (high-density lipoprotein) cholesterol. [See Mattson et 
al (1985) Journal of Lipid Research 26:194-202, Grundy (1986) New England Journal of Medicine 314:745- 

30 748 and Mensink et al. (1987) The Lancet 1:122-125, all collectively herein incorporated by reference.] 
These results corroborate previous epidemiological studies of people living in Mediterranean countnes 
where a relatively high intake of monounsaturated fat and low consumption of saturated fat correspond with 
low coronary heart disease mortality. [Keys, A.. Seven Countries: A Multivariate Analysis of Death and 
Coronary Heart Disease. Cambridge: Harvard University Press. 1980, herein incorporated by reference.] 

35 The significance of monounsaturated fat in the diet was further confimied by international researchers from 
seven countries at the Second Colloquim on Monounsaturated Fats held February 26. 1987, in Bethesda. 
MD and sponsored by the National Heart, Lung and Blood Institutes [Report. Monounsaturates Use Said to 
Lower Several Major Risk Factors. Food Chemical News. March 2. 1987. p. 44. herein incorporated by 
reference } 

Soybean oil is also relatively high in polyunsaturated fatty acids - at levels in far excess of our 
essential dietary requirement. These fatty acids oxidize readily to give off-flavors and result in reduced 
performance associated with unprocessed soybean oil. The stability and flavor of soybean oil is improved 
by hydrogenation. which chemically reduces the double bonds. However, the need for this processing 
reduces the economic attractiveness of soybean oil. 

A soybean oil low in total saturates and polyunsaturates and high in monounsaturate would provide 
significant health benefits to the United States population, as well as, economic benefit to oil processors. 
Soybean varieties which produce seeds containing the improved oil will also produce valuable meal as 
animal feed. 

Another type of differentiated soybean oil is an edible fat for confectionary uses. More than 2 billion 
pounds of cocoa butter, the most expensive edible oil, are produced worldwide. The U.S. imports several 
hundred million dollars worth of cocoa butter annually. The high and volatile prices and uncertain supply of 
cocoa butter have encouraged the development of cocoa butter substitutes. The fatty acid composition of 
cocoa butter is 26% palmitic, 34% stearic, 35% oleic and 3% linoleic acids. About 72% of cocoa butter's 
triglycerides have the structure in which saturated fatty acids occupy positions 1 and 3 and oleic acid 
occupies position 2. Cocoa butter's unique fatty acid composition and distribution on the triglycende 
molecule confer on it properties eminently suitable for confectionary end-uses: it is brittle below 27 -C and 
depending on its crystalline state, melts sharply at 25-30 -C or 35-36 -C. Consequently, it is hard and non- 
greasy at ordinary temperatures and melts very sharply in the mouth. It is also extremely resistant to 
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rancidity. For these reasons, producing soybean oil with increased levels of stearic acid, especially in 
soybean lines containing higher-than-normal levels of palmitic acid, and reduced levels of unsaturated fatty 
acids is expected to produce a cocoa butter substitute in soybean. This will add value to oil and food 
processors as well as reduce the foreign import of certain tropical oils. 

Only recently have serious efforts been made to improve the quality of soybean oil through plant 
breeding, especially mutagenesis, and a wide range of fatty acid composition has been discovered in 
experimental lines of soybean (Table 1). These findings (as well as those with other oilcrops) suggest that 
the fatty acid composition of soybean oil can be significantly modified without affecting the agronomic 
performance of a soybean plant. However, there is no soybean mutant line with levels of saturates less than 
those present in commercial canola, the major competitor to soybean oil as a "healthy" oil. 

TABLE 1 



Range of Fatty Acid Percentages Produced by Soybean Mutants 


Fatty Acids 


Range of Percentages 


Palmitic Acid 


6-28 


Stearic Acid 


3-30 


Oleic Acid 


17-50 


Linoleic Acid 


35-60 


Linolenic Acid 


3-12 



There are serious limitations to using mutagenesis to alter fatty acid composition. One is unlikely to 
discover mutations a) that result in a dominant ("gain-of-f unction") phenotype, b) in genes that are essential 
for plant growth, and c) in an enzyme that Is not rate-limiting and that is encoded by more than one gene. 
Even when some of the desired mutations are available in soybean mutant lines their introgression into elite 
lines by traditional breeding techniques will be slow and expensive, since the desired oil compositions in 
soybean are most likely to involve several recessive genes. 

Recent molecular and cellular biology techniques offer the potential for overcoming some of the 
limitations of the mutagenesis approach, including the need for extensive breeding. Particularly useful 
technologies are: a) seed-specific expression of foreign genes in transgenic plants [see Goldberg et al. 
(1989) Cell 56:149-160]. b) use of antisense RNA to inhibit plant target genes in a dominant and tissue- 
specific manner [see van der Krol et al. (1988) Gene 72:45-50]. c) transfer of foreign genes into elite 
commercial varieties of commercial oilcrops, such as soybean [Ghee et al. (1989) Plant Physiol. 91:1212- 
1218; Christou et al. (1989) Proc. Natl. Acad. Sci. U.S.A. 86:7500-7504; Hinchee et al. (1988) 
Bion^^echnology 6:915-922; EPO publication 0 301 749 A2], rapeseed [De Block et al. (1989) Plant Physiol. 
91:694-701]. and sunflower [Everett et al.(1987) BioH-echnology 5:1201-1204], and d) use of genes as 
restriction fragment length polymorphism (RFLP) markers in a breeding program, which makes introgres- 
sion of recessive traits into elite lines rapid and less expensive [Tanksley et al. (1989) Bio/Technology 
7:257-264]. However, application of each of these technologies requires identification and isolation of 
commercially-important genes. 

Oil biosynthesis in plants has been fairly well-studied [see Hanwood (1989) in Critical Reviews in Plant 
Sciences, Vol. 8(1) 1-43]. The biosynthesis of palmitic, stearic and oleic acids occur in the plastids by the 
interplay of three key enzymes of the "AGP track": palmitoyl-AGP elongase, stearoyl-AGP desaturase and 
acyl-AGP thioesterase. Stearoyl-AGP desaturase Introduces the first double bond on stearoyl-AGP to form 
oleoyl-AGP. It is pivotal in determining the degree of unsaturation in vegetable oils. Because of its key 
position in fatty acid biosynthesis it is expected to be an important regulatory step. While the enzyme's 
natural substrate is stearoyl-AGP, it has been shown that it can, like its counterpart in yeast and mammalian 
cells, desaturate stearoyl-CoA, albeit poorly [McKeon et al. (1982) J. Biol. Chem. 257:12141-12147]. The 
fatty acids synthesized in the plastid are exported as acyl-GoA to the cytoplasm. At least three different 
glycerol acylating enzymes (glycerol-3-P acy transferase, 1-acylglycerol-3-P acyltransferase and diacyl- 
glycerol acyltransferase) Incorporate the acyl moieties from the cytoplasm into triglycerides during oil 
biosynthesis. These acyltransferases show a strong, but not absolute, preference for incorporating saturated 
fatty acids at positions 1 and 3 and monounsaturated fatty acid at position 2 of the triglyceride. Thus, 
altering the fatty acid composition of the acyl pool will drive by mass action a corresponding change in the 
fatty acid composition of the oil. Furthermore, there is experimental evidence that, because of this 
specificity, given the correct composition of fatty acids, plants can produce cocoa butter substitutes [Bafor 
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et al. (1990) JAOCS 67:217-225]. 

Based on the above discussion, one approach to altering the levels of stearic and oleic actds in 
vegetable oils is by altering their levels in the cytoplasnnic acyl-CoA pool used for oil biosynthesis. There 
are two ways of doing this genetically: a) altering the biosynthesis of stearic and oleic acids in the plastid 
by modulating the levels of stearoyl-ACP desaturase in seeds through either overexpression or antisense 
inhibition of its gene, and b) converting stearoyl-CoA to oleoyl-CoA in the cytoplasm through the expression 
of the stearoyl-ACP desaturase in the cytoplasm. 

In order to use antisense inhibition of stearoyl-ACP desaturase in the seed, it is essential to isolate the 
gene(s) or cDNA(s) encoding the target enzyme(s) in the seed, since antisense inhibition requires a high- 
degree of complementarity between the antisense RNA and the target gene that is expected to be absent in 
stearoyl-ACP desaturase genes from other species or even in soybean stearoyl-ACP desaturase genes that 
are not expressed in the seed. 

The purification and nucleotide sequences of mammalian microsomal stearoyl-CoA desaturases have 
been published [Thiede et al. (1986) J. Biol. Chem. 262:13230-13235; Ntambi et al. (1988) J. Biol. Chem. 
263:17291-17300; Kaestner et al. (1989) J. Biol. Chem. 264:14755-14761]. However, the plant enzyme 
differs from them in being soluble, in utilizing .a different electron donor, and in Its substrate-specificities. 
The purification and the nucleotide sequences for animal enzymes do not teach how to purify the plant 
enzyme or isolate a plant gene. The purification of stearoyl-ACP desaturase was reported from safflower 
seeds [McKeon et al. (1982) J. Biol. Chem. 257:12141-12147]. However, this purification scheme was not 
useful for soybean, either because the desaturases are different or because of the presence of other 
proteins such as the soybean seed storage proteins in seed extracts. 

The rat liver stearoyl-CoA desaturase protein has been expressed in E. coli [Strittmatter et al. (1988) J. 
Biol. Chem. 263:2532-2535] but, as mentioned above, its substrate specificity and electron donors are quite 
distinct from that of the plant. 



SUMMARY OF THE INVENTION 

A means to control the levels of saturated and unsaturated fatty adds in edible plant oils has been 
discovered. Utilizing the soybean seed stearoyl-ACP desaturase cDNA for either the precursor or enzyme, 
chimeric genes are created and may be utilized to transform various plants to modify the fatty acid 
composition of the oil produced. Specifically, one aspect of the present Invention Is a nucleic acid fragment 
comprising a nucleotide sequence encoding the soybean seed stearoyl-ACP desaturase cDNA correspond- 
ing to the nucleotides 1 to 2243 in SEQ ID N0:1, or any nucleic acid fragment substantially honnologous 
therewith. Preferred are those nucleic acid fragments encoding the soybean seed stearoyl-ACP desaturase 
precursor or the mature soybean seed stearoyl-ACP desaturase enzyme. 

Another aspect of this Invention involves a chimeric gene capable of transforming a soybean plant cell 
comprising a nucleic acid fragment encoding the soybean seed stearoyl-ACP desaturase cDNA operably 
linked to suitable regulatory sequences producing antisense Inhibition of soybean seed stearoyl-ACP 
desaturase in the seed. Preferred are those chimeric genes which incorporate nucleic acid fragments 
encoding the soybean seed stearoyl-ACP desaturase precursor or the mature soybean seed stearoyl-ACP 
desaturase enzyme. 

Yet another embodiment of the invention Involves a method of producing seed oil containing modified 
levels of saturated and unsaturated fatty acids comprising: (a) transforming a plant cell with a chimeric gene 
described above, (b) growing sexually mature plants from said transformed plant cells, (c) screening 
progeny seeds from said sexually mature plants for the desired levels of stearic acid, and (d) crushing said 
progeny seed to obtain said oil containing modified levels of stearic acid. Preferred plant cells and oils are 
derived from soybean, rapeseed, sunflower, cotton, cocoa, peanut, safflower, and corn. Preferred methods 
of transforming such plant cells would include the use of Ti and Ri plasmids of Agrobacterium , elec- 
troporation, and high-velocity ballistic bombardment. 

DETAILED DESCRIPTION OF THE INVENTION 

The present Invention describes a nucleic acid fragment that encodes soybean seed stearoyl-ACP 
desaturase. This enzyme catalyzes the Introduction of a double bond between carbon atoms 9 and 10 of 
stearoyl-ACP to form oleoyl-ACP. It can also convert stearoyl-CoA into oleoyl-CoA, albeit with reduced 
efficiency. Transfer of the nucleic acid fragment of the Invention, or a part thereof that encodes a functional 
enzyme, with suitable regulatory sequences into a living cell will result in the production or over-production 
of stearoyl-ACP desaturase, which in the presence of an appropriate electron donor, such as ferredoxin, 
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.ay result in an increased .eve, of unsaturation in ceHuiar lipids, including oil. in tissues when the enzyme 

''Z:X':^^o. ot a .ene or a P-~ ^^1^::^:: ^3^. 

reintroduced and the endogenous gene. j'^J"^^ expected to. in some cases, 

polymorphism marker in soybean genetic ^^"^'^^^^^^J^f^^^^^^^^^^ used herein, the term "nucleic 

In the context of this disclosure, a number of ^f'^JJ^^^f .''^^^^^^^^^ ^.^^^ed. composed of monorr^ers 
acid" refers to a large molecule which can '^^J-^.^J^^^^',^ a "nucleic acid fragment" is 

(nucleotides) containing a sugar, phosphate ar^d J ^^^^^l^rdeow^^^^^^^ acid (DNA) is the genetic 
a fraction of a given nucleic acid '"o'^^"'^" /"J f f/^^^;^^^^^^ in DNA into proteins. A 

material while ribonucleic acid (RNA) .s ^j^^^'^J^J^'^J^^^; °* ^'^.r^ell of an organism. The tem. 
"genome" Is the entire body of ^-'^^^^J^''^^;^^ single- or double-stranded, 

, "nucleotide sequence" refers to a po yrne ^^^DN^ ° ^^^^ ^le of incorporation into DNA or 
optionally containing synthetic, non-natural o'^.^'*^'^^ "™ ^o the complementarity between the 

RNA polymers. As used herein the '^'I'^'^^^^^l^^^^^^ t2o acid sequences of two protein 
nucleotide sequence of two nucleic acid '"^'^^^^f.^^J'^J^^^^^^ or DNA-RNA hybridization under 
molecules. Estimates of such ^omotogy are provid^^^^^^^^ omo 

5 conditions of stringency as is we^l ""der^°°^ ^^^^ U K.]; or by the comparison of sequence 

Higglns. Eds. (1985) Nucleic Acid Hybridisa^on lR^ .. ubstantially homologous" refers to 

sracrm^ursrhrreSi^^ 

. rd-eHmro S:rwl?rnre;S ZZpl^^ an amino acid, but not a«ect .e 

36 sequences are substantially homologous expresses a specific protein, including regulatory 

"Gene" refers to a nucleic acid fragment that ^^f'^^^^^ ? region. "Stearoyl-ACP 

sequences preceding (S' non-coding) 

desaturase gene" refers to a nucleic acid regulatory sequences. "Chimeric" 

activity. "Native" gene refers to the ^-^^s f J^^^^^^^^ sequences. "Endogenous" gene 

te not normally found in the host organism but that is -^^^^J excludes the non- 

-Coding sequence- refers to a DNA sequence that ^ specitic p 

coding sequences. It may constitute an """'"^^^^f ^^.^tv^oX^^^^ An "intron" is a 

. a cDNA or it may include one or more '"'--^^-"^^^^^^ J^ived through cleavage and 

sequence of RNA which is transcnbed in the P""'^^.'^^"'^"^^^^^^ be translated into a protein, 
re'igation of the RNA within the cell to create he ma^^^^^^ ^,^^3 

"Translation initiation /"^^ n^«^^^^^^^^^ -P^^-'^- of P«.tein 

. ImRNA^SS^^^^^^ - - --"^^ """" 

translation initiation and termination codons of a <=°d.ng sequence. transcription of a DNA 

"RNA transcript" refers to the P'^^^'^^JJ"!*'"^^^^^^^ sequence, it is referred to 

sequence. When the RNA transcript is l^^^l^^^l^Zd ^l posttranscriptional processing of the 
as the primary transcnpt or it may be a RNA sequence aeriveo ^ 
55 primary transcript and is referred to as the matu^^ Messenge RNA ( ^ double-stranded 

Is without introns and that can l^tad into protej^ by^^^^^^^^^ ^^^.^^ 

rNrrre r ^^^^ is complementary to an or part of a 
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15 



a tflmet aene by interfering with the 

contain regions of ribozyme '^^''^^^''^^^^^^ (5.) 
exoression. "Ribozyme" refers to a catalytic RNA ana ^^^^^^ sequences located upstream (b ) 

S r»qui,ed for prop« IranscripBon. In a*^l DNA ^ .„ ^ „ 

can stimulate promoter activity. It may be an '""f^ f j"^"' ^o^r. "Constitutive promoters" refers to 
SnseSd to enhance the level and/or ^^'^^'^"^^l^^rZu^specif^c' or -development-specfic 

eaS of Isense inh,bi«on « rolors » P^^^^^"! W p.odu=«on 0. a go", product .n 

" TO non-codins saqnences" rofers ^"^^"Z^ „, ^«,„s mRNA procosaing or go" 
S^rret^-^CarSTrS- a^-, « ad«on . po„ad.,> 

may be abbreviated as RPLh' . rermo 
PurificaUon^^ 

„,adeTm developing soybean seeds Because of the lability of the enzyme 

Zl ACP sepharose. and chromatofocussing on ea. a few hundred-fold; the basis of 

du^rpurification. the nearly homogenous P-P^^^^^^^^^^^ into two peaks of activity: the pea 

50 thi labSity is not understood. Chromatofocuss,ng resd^^^^^^^^^^ p^^, ^,^..„g , ter. ^^h 

that eluted earlier, with an apparent pi of ca. 6. had ^ f J^^^^^^^^^^ ^ estimated by gel filtration 
Tn attent Pl of ca. 5.7. The native molecular we-ght of ^« P^^'^J*^^^^^^^^^ p,,fied desaturase preparation 

0 be KD. SDS-polyacrylamide , a di-r. A smal^r 

- 'Sen. yme could .so be a heterodimer or that there 



during storage 
are different-sized isozymes 
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20 



25 



30 



35 



40 



. p.*. d,s*.. prep-"- - l^^^^^'ZTTJ, ~ Te 

rrr/p:::^;- ... .... . 

for determining the N-terminal sequence: Arg-Ser-Gly Ser Lys 
(SEQ ID N0:3). 

CloningofSoybeHS^^ 

eased on the N-ter.ina. sequence ot -^^^^ \^'^r.Z"^TZ 
nucleotide-long oligonucleotides ^"'^J^^'^^'^^l^'^^ and used live deoxyinosines at selected 
account the codon usage in selected ^ovl^f^^" ^enes ^ ^^^^.^^^.^^ 

positions of ambigui^. The ^f^^^^^^l^^^^^^^^^^^^ soybean seeds. Six positively- 

made in Lambda ZAP vector from poly A RNA trom J pBluescript (Stratagene) vector, 

hybridizing plaques subjected to p^^^^^^^^^^ ,,,';,33ence of a helper 

K insert in ,asmid pDS1 is flanked at one -d <the^- ^ ^Xl^TS^ Z 
unique Eco Rl site and at its other end by ti.e un.que H.nd "^'^^ Both^^° ,^ an open 

from the vector, pBluescript. The ^-^'^^'fJ^'l^^Zw^^^^^ « "^"^ 

reading frame for 402 amino acids that mcluded '"^^"^^^^^^^^ part of this "presequence" 

residues from the N-terminus of the open '^^^'"^^'^"^^^f^^^^^^ there are four methionines 

is the transit peptide required for P--?- '^^^J^^^J^^^^^^^^^ most likely N-terminal residue 

in this presequence that are .n-frame witti the rnatu e P;o'«'" ^^^^„^ to as +1) since: a) 

is methionine at position -32 (with "^-terminal A^^^^^^^ 9 p,,,,i„3. 

the N-terminal methionine in the trans. P^P^'^^^f^"""!^;^^^^^ ^t position -5 is too dose to the N- 
only one exception, is follovved by alanme. -^J^ J^^^^f peptide (the smallest transit 

terminus of the mature protem to be the "^'^^'"S W'^^" ^^^^^^ that the desaturase precursor 
sequence found thus far is 31 ^^'f '"J^^^^^^^^ acid long mature protein. Based 

protein consists of a 32-ammo acid 'o"9 P®/ ^ _„teins is fused either to the desaturase 
on fusion-protein studies in which the ° ^^^^^^^^^ .10 (lie), the N-terminus of a 

precursor at position -10 (Ser) or to the -"^^ ^Jf^^^^^ L 10 amino acids from Arg at position ^1 
functional stearoyl-ACP desaturase enzyme can range at least 

(SEQIDN0:1). thn-nh nnt identical showed a common 0.7 kb Bgl II fragment 

The restriction maps of all six plasm.ds. though no ^^^^^l^^^^^^,^^ ,„ posi. This strongly 
found within the coding region of the PI^^ "^^"'^^^^^^ restriction maps of plasmids 

The results are summarized below: 



46 



50 



Clone # 


Seauence correspondence to SEQ ID N0:1 


Percent Identity 


1 
2 
3 
4 
5 


1291-1552 
1291-1394 
1285-1552 
1285-1552 

1298-1505 


100 
100 
100 
100 
91 



65 



/QFO in NO D most likely represents the predominantly-expressed 
Thus, while the claimed sequence (SEQ "D N0-1> other stearoyl-ACP desaturase gene 
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is shown in SEQ ID N0:2. , .^^ ^ . 

As expected, comparison of the deduced amino-acid sequences for soybean stearoyl-ACP desaturase 

and the rat microsomal stearoyi-CoA desaturases did not reveal any significant homology. 
In vitro recombinant DNA techniques were used to make tviro fusion proteins: 
arr^^combinant plasmid pGEXB that encodes a ca. 66 KD fusion protein consisting of a 28 KD 
glutathione-S-transferase (QST) protein fused at its C-terminus to the ca. 38 KD desaturase precursor 
protein at amino acid residue -10 from the N-terminus of the mature enzyme (Arg, +1) (SEQ ID NO:i). 
Extracts of E. coli cells harboring pGEXB, grown under conditions that induce the synthesis of the fusion 
protein, show ^aroyl-ACP desaturase activity and expression of a ca. 66 kD fusion protein that cross- 
reacts with antibody made against soybean stearoyl-ACP desaturase and that binds to glutathione- 
agarose affinity column. The affinity column can be used to purify the fusion protein to near-homogeneity 
in a single step. The desaturase moiety can be cleaved off in the presence of thrombin and separated 
from the GST by re-chromatography on the glutathione-agarose column; and 

b) a recombinant plasmid. pNS2. that encodes a ca. 42 kD fusion protein consisting of 4 kD o the N- 
terminus of fl-galactosidase fused at its C-terminus to the amino acid residue at position +10 (He) from 
the N-termln~us of the mature desaturase protein (Arg. +1) (SEQ ID N0:1). Extract of E. coli cells 
harboring pNS2 express a ca. 42 kD protein that cross-reacts with antibody made against soybean 
stearoyl-ACP desaturase and show stearoyl-ACP desaturase activity. 
E coli (pGEXB) can be used to purify the stearoyl-ACP desaturase for use in structure-function studies 
on the"iii2yme. in immobilized cells or in extracellular desaturations [see Ratledge et al (1984) Eds 
Biotechnology for the Oils and Fats Industry. American Oil Chemists' Society]. E. coh (pNS2) can be used 
to express the desaturase enzyme invjyo. However, for invivo function it may be necessary to introduce an 
electron donor, such as ferredoxin and NADPH:ferredoxin reductase. The fen-edoxin gene has been cloned 
from a higher plant [Smeekens et al. (1985) Nucleic Acids Res. 13:3179-3194] and human ferredoxin has 
been expressed In E. coli [Coghlan et al. (1989) Proc. Natl. Acad. Sci. USA. 86:835-839]. Altematively. one 
skilled in the art ran express the mature protein in microorganisms using other expressiori vectors 
described in the art [Sambrook et al. (1989) Molecular Cloning: A Laboratory Manual. 2nd Ed. Cold Spnng 
Harbor Laboratory Press; Milman (1987) Meth. Enzymol. 153:482-491; Duffaud et al. (1987) Meth. Enzymol. 
153-492-507; Weinstock (1987) Meth. Enzymol. 1 54:156-163; E.P.O. Publication 0 295 959 A2). 

The fragment of the instant invention may be used, if desired, to isolate substantially homologous 
stearoyl-ACP desaturase cDNAs and genes, including those from plant species other than soybean. 
Isolation of homologous genes is well-known in the art. Southern blot analysis reveals that the soybean 
cDNA for the enzyme hybridizes to several, different-sized DNA fragments in the genomic DNA of tomato, 
rapeseed ( Brassica napus). soybean, com (a monocotyledenous plant) and Arabidopsis (which has a very 
simple genome). Th'^^them blot of com DNA reveals that the soybean cDNA can also hybridize non- 
specifically. which may make the isolation of the com gene more difficult. Although we do not know how 
many different genes or "pseudogenes" (non-functional genes) are present in any plant, it is expected to be 
more than one. since stearoyl-ACP desaturase is an important enzyme. Moreover, plants that are am- 
phidiploid (that is. derived from two progenitor species), such as soybean, rapeseed (B. napus). and 
tobacco will have genes from both progenitor species. , .^r, ^ ♦ 

The nucleic acid fragment of the instant invention encoding soybean seed stearoyl-ACP desaturase 
cDNA or a coding sequence derived from other cDNAs or genes for the enzyme, with suitable regulatory 
sequences, can be used to overexpress the enzyme in transgenic soybean as well as other transgenic 
species Such a recombinant DNA construct may include either the native stearoyl-ACP desaturase gene or 
a chimeric gene. One skilled in the art can isolate the coding sequences from the fragment of the invention 
by using and/or creating sites for restriction endonucleases, as described in Sambrook et al. [(1989) 
Molecular Cloning: A Laboratory Manual. 2nd Ed. Cold Spring Harbor Laboratory Press]. Of particular utility 
are sites for Nco 1 (5'-CCATGG-3') and Sph I (5'-GCATGC-3') that allow precise removal of coding 
sequences starting with the initiating codon ATG. The fragment of invention has a Nco I recognition 
sequence at nucleotide positions 1601-1606 (SEQ ID N0:1) that is 357 bp after the termination codon for 
the coding sequence. For isolating the coding sequence of stearoyl-ACP desaturase precursor from the 
fragment of the Invention, an Nco I site can be engineered by substituting nucleotide A at position 69 with 
C This will allow isolation of the 1533 bp Nco I fragment containing the precursor coding sequence. The 
expression of the mature enzyme in the cytoplasm is expected to desaturate stearoyl-CoA to oleoyl-CoA. 
For this it may be necessary to also express the mature ferredoxin in the cytoplasm, the gene for which has 
been cloned from plants [Smeekens et al. (1985) Nucleic Acids Res. 13:3179-3194]. For isolating the coding 
sequence for the mature protein, a restriction site can be engineered near nucleotide position 164. For 
example substituting nucleotide G with nucleotide C at position 149 or position 154 would result in the 
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creation of Nco I site or Sph I site, respectively. This will allow isolation of a 1453 bp Nco I fragment or a 
1448 bp Sph l-Nco I fragment, each containing the mature protein sequence. Based on fusion protein 
studies, the N-termtnus of the mature stearoyl-ACP desaturase enzyme is not critical for enzyme activity. 
Antisense RNA has been used to inhibit plant target genes in a dominant and tissue-specific manner 

5 [see van der Krol et ai. (1988) Gene 72:45-50; Ecker et al. (1986) Proc. Natl. Acad. Sci. USA 83:5372-5376; 
van der Krol et al. (1988) Nature 336:866-869; Smith et al. (1988) Nature 334:724-726; Sheehy et al. (1988) 
Proc. Natl. Acad. Sci. USA 85:8805-8809; Rothstein et al. (1987) Proc. Natl. Acad. Sci. USA 84:8439-8443; 
Cornelissen et al. (1988) Nucl. Acids Res. 17:833-843; Comelissen (1989) NucL Acid Res. 17:7203-7209; 
Robert et al. (1989) Plant MoL Biol. 13:399-409]. 

10 The use of antisense inhibition of the seed enzyme would require isolation of the coding sequence for 
genes that are expressed in the target tissue of the target plant. Thus, it will be more useful to use the 
fragment of the invention to screen seed-specific cDNA libraries, rather than genomic libraries or cDNA 
libraries from other tissues, from the appropriate plant for such sequences. Moreover, since there may be 
more than one gene encoding seed stearoyl-ACP desaturase, it may be useful to isolate the coding 

76 sequences from the other genes from the appropriate crop. The genes that are most highly expressed are 
the best targets for antisense Inhibition, The level of transcription of different genes can be studied by 
known techniques, such as run-off transcription. 

For expressing antisense RNA in soybean seed from the fragment of the invention, the entire fragment 
of the invention (that is, the entire cDNA for soybean stearoyl-ACP desaturase from the unique Eco Rl to 

20 Hind III sites in plasmid pDSI) may be used. There is evidence that the 3* non-coding sequences can play 
an important role in antisense inhibition [Ch'ng et al. (1989) Proc. Natl. Acad. Sci. USA 86:10006-10010]. 
There have also been examples of using the entire cDNA sequence for antisense inhibition [Sheehy et al. 
(1988) Proc. Natl. Acad. Sci. USA 89:8439-8443]. The Hind ill and Eco Rl sites can be modified to facilitate 
insertion of the sequences into suitable regulatory sequences in order to express the antisense RNA. 

26 A preferred host soybean plant for the antisense RNA inhibition of stearoyl-ACP desaturase for 
producing a cocoa butter substitute in soybean seed oil is a soybean plant containing higher-than-normal 
levels of palmitic acid, such as A19 double mutant, which is being commercialized by Iowa State University 
Research Foundation, Inc. (315 Beardshear, Ames, Iowa 50011). 

A preferred class of heterologous hosts for the expression of the coding sequence of stearoyl-ACP 

30 desaturase precursor or the antisense RNA are eukaryotic hosts, particularly the cells of higher plants. 
Particularly preferred among the higher plants are the oilcrops, such as soybean (Glycine max), rapeseed 
(Brassica napus, B, campestris ), sunflower ( Helianthus annus ), cotton (Gossypium hirsutum ). com (Zea 
mays ), cocoa (Theobroma cacao ), and peanut (Arachis hypogaea ). Expression in plants will use regulatory 
sequences functional in such plants. 

35 The expression of foreign genes in plants Is well-established [De Blaere et al. (1987) Meth. Enzymol. 
153:277-291]. The origin of promoter chosen to drive the expression of the coding sequence or the 
antisense RNA is not critical as long as it has sufficient transcriptional activity to accomplish the invention 
by increasing or decreasing, respectively, the level of translatable mRNA for stearoyl-ACP desaturase in the 
desired host tissue. Preferred promoters include strong plant promoters (such as the constitutive promoters 

40 derived from Cauliflower Mosaic Virus that direct the expression of the 19S and 35S viral transcripts [Odell 
et al. (1985) Nature 313:810-812; Hull et al. (1987) Virology 86:482-493]), small subunit of ribulose 1.5- 
bisphosphate carboxylase [Morelli et al. (1985) Nature 315:200; Broglie et al. (1984) Science 224:838; 
Heren-a-Estrella et al. (1984) Nature 310:115; Coruzzi et al. (1984) EMBO J. 3:1671; Faciotti et al. (1985) 
Bio/Technology 3:241], maize zein protein [Matzke et al. (1984) EMBO J. 3:1525], and chlorophyll a/b 

46 binding protein [Lampa et al. (1986) Nature 316:750-752]. 

Depending upon the application, it may be desirable to select inducible promoters and/or tissue- or 
development-specific promoters. Such examples include the light-inducible promoters of the small subunit 
of ribulose 1 ,5-bisphosphate carboxylase genes (if the expression is desired in tissues with photosynthetic 
function). 

50 Particularly preferred tissue-specific promoters are those that allow seed-specific expression. This may 
be especially useful, since seeds are the primary source of vegetable oils and also since seed-specific 
expression will avoid any potential deleterious effect in non-seed tissues. Examples of seed-specific 
promoters include but are not limited to the promoters of seed storage proteins, which can represent up to 
90% of total seed protein in many plants. The seed storage proteins are strictly regulated, being expressed 

55 almost exclusively in seeds in a highly tissue-specific and stage-specific manner [Higgins et al. (1984) Ann. 
Rev. Plant Physiol. 35:191-221; Goldberg et al. (1989) Cell 56:149-160]. Moreover, different seed storage 
proteins may be expressed at different stages of seed development. 
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Expression of seed-specific genes has been studied in great detail [see reviews by Goldberg et al. 
(1989) Cell 56:149-160 and Higgins et al. (1984) Ann. Rev. Plant Physiol. 35:191-221]. There are currently 
numerous examples for seed-specific expression of seed storage protein genes in transgenic dicotyledon- 
ous plants These include genes from dicotyledonous plants for bean ^-phaseolin [Sengupla-Gopaian et al. 
(1985) Proc Natl. Acad. Sci. USA 82:3320-3324; Hoffman et al. (1988) Plant Mol. Biol. 11:717-729], bean 
lectin [Voelker et al. (1987) EMBO J. 6: 3571-3577], soybean lectin [Okamuro et al. (1986) Proc. Natl. Acad. 
Sci USA 83- 8240-8244], soybean kunitz trypsin inhibitor [Perez-Grau et al. (1989) Plant Cell 1:095-1109], 
soybean /9-conglycinin [Beachy et al. (1985) EMBO J. 4:3047-3053; Barker et al. (1988) Proc. Natl. Acad. 
Sci USA 85:458-462; Chen et al. (1988) EMBO J. 7:297-302; Chen et al. (1989) Dev. Genet. 10:112-122; 
Naito et al. (1988) Plant Mol. Biol. 11:109-123], pea vicilin [Higgins et al. (1988) Plant Mol. Biol. 11:683-695], 
pea convicilin [Newbigin et al. (1990) Planta 180:461], pea legumin [Shirsat et al. (1989) Mol. Gen. Genetics 
215-326]; rapeseed napin [Radke et al. (1988) Theor. Appl. Genet. 75:685-694] as v/ell as genes from 
monocotyledonous plants such as for maize 15-kD zein [Hoffman et al. (1987) EMBO J. 6:3213-3221], and 
barley iS-hordein [Marris et al. (1988) Plant Mol. Biol. 10:359-366] and wheat glutenin [Colot et al. (1987) 
EMBO J 6'3559-3564]. Moreover, promoters of seed-specific genes operably linked to heterologous coding 
sequences in chimeric gene constructs also , maintain their temporal and spatial expression pattern in 
transgenic plants. Such examples include Arabidopsis thaliana 2S seed storage protein gene promoter to 
express enkephalin peptides in Arabidopsis and B. napus seeds [Vandekerckhove et al. (1989) 
Bio/Technology 7:929-932], bean lectin and bean /3-phaseolin promoters to express luciferase [Riggs et al. 
(1989) Plant Sci. 63:47-57], and wheat glutenin promoters to express chloramphenicol acetyl transferase 
[Colot et al. (1987) EMBO J. 6:3559-3564]. 

Of particular use in the expression of the nucleic acid fragment of the invention will be the heterologous 
promoters from several extensively-characterized soybean seed storage protein genes such as those for the 
Kunitz trypsin inhibitor [Jofuku et al. (1989) Plant Cell 1:1079-1093; Perez-Grain et al. (1989) FMant Cel 
1-1095-1109]. glycinin [Nielson et al. (1989) Plant Cell 1:313-328], /3-conglycinin [Harada et al. (1989) Plant 
Cell 1-415-425] Promoters of genes for a- and /3-subunits of soybean /S-conglycinin storage protein will be 
particularly useful in expressing the mRNA or the antisense RNA to stearoyl-ACP desaturase m the 
cotyledons at mid- to late-stages of seed development [Beachy et al. (1985) EMBO J. 4:3047-3053; Barker 
et al (1988) Proc. Natl. Acad. Sci. USA 85:458-462; Chen et al. (1988) EMBO J. 7:297-302; Chen et al. 
(1989) Dev Genet. 10:112-122; Naito et al. (1988) Plant Mol. Biol. 11:109-123] in transgenic plants, since: a) 
there is very little position effect on their expression in transgenic seeds, and b) the two promoters show 
different temporal regulation: the promoter for the a-subunit gene is expressed a few days before that for 
the j8-subunit gene: this is important for transforming rapeseed where oil biosynthesis begins about a week 
before seed storage protein synthesis [Murphy et al. (1989) J. Plant Physiol. 1 35:63-69]. 

Also of particular use will be promoters of genes expressed during early embryogenesis and oil 
biosynthesis The native regulatory sequences, including the native promoter, of the stearoyl-ACP de- 
saturase gene expressing the nucleic acid fragment of the invention can be used following its isolation by 
those skilled in the art. Heterologous promoters from other genes involved in seed oil biosynthesis, such as 
those for B. napus isocitrate lyase and malate synthase [Comal et al. (1989) Plant Cell 1:293-300], 
ArabidopsifACPlPost-Beittenmiller et al. (1989) Nucl. Acids Res. 17:1777], B. nagus ACP [Safford et al. 
(1988) Eur. J. Biochem. 174:287-295], B. campestris ACP [Rose et al. (1987) Nucl. Acids Res. 15:7197] 
may also be used. The partial protein sequences for the relatively-abundant enoyl-ACP reductase and 
acetyl-CoA carboxylase are published [Slabas et al. (1987) Biochim. Biophys. Acta 877-271-280; Cottingham 
et al. (1988) Biochim. Biophys. Acta 954: 201-207] and one skilled in the art can use these sequences to 
isolate the corresponding seed genes with their promoters. 

Proper level of expression of stearoyl-ACP mRNA or antisense RNA may require the use of different 
chimeric genes utilizing different promoters. Such chimeric genes can be transfered into host plants either 
together in a single expression vector or sequentially using more than one vector. 

It is envisioned that the introduction of enhancers or enhancer-like elements into either the native 
stearoyl-ACP desaturase promoter or into other promoter constructs will also provide increased levels of 
primary transcription for antisense RNA or in RNA for stearoyl-ACP desaturase to accomplish the 
inventions. This would include viral enhancers such as that found in the 35S promoter [Odell et al. (1988) 
Plant Mol. Biol. 10:263-272], enhancers from the opine genes (Fromm et al. (1989) Plant Cell 1:977-984), or 
enhancers from any other source that result in increased transcription when placed into a promoter operably 
linked to the nucleic acid fragment of the invention. 

Of particular importance is the DNA sequence element isolated from the gene for the o-subunit of 0- 
conglycinin that can confer 40-fold seed-specific enhancement to a constitutive promoter [Chen et al. (1988) 
EMBO J. 7:297-302; Chen et al. (1989) Dev. Genet. 10:112-122]. One skilled in the art can readily isolate 
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e,e.ent and inse. it wU.n .e -r;:rri^^^^ 
Z^Z^:^ S— rZ:: SSVe .s. . ex.es.on in uans.e.c p.ants 

a ... 

form, the invention is based on modifying P'-IV^eTTX "e^ ^esaturase 

that may be required tor the proper -P^^^/^f ^J^^^^^^^^ soybean 
to accomplish the invention. Th,s wouW '"^'"^^^ desaturase gene, the 3' end 

stearoyl-ACP desaturase genejs). the ^^'f ^'^^^i^^^^^ virus transcripts, the 3' end 

from viral genes such as «nd ^g.^^^^f .^ri^^^^^ carboxylase or chlorophyll aA, 

from the opine synthesis genes, the 3 ends of nouios ^ ^^^^^^ 

those based on transfom,ation vectors based on ^^'^^f^. derivTdTeJors tr^^^^ 
particularly preferred to use the binary type cotton and rape 

Eigher plants, including monocotyledonous and Organ Culture 8:3: 

. [Pacciotti at ^.(1985) Biotechnology 3:241; Byree^^^ ( 987 P'an Ce 1 ^^^^^ ^^^^^^ 

Sukhapinda et al. (1987) Plant Mol. B.ol. 8.209-216. ^J-J^yi^l ., those skilled in the art. 
Once transformed the cells can be regenerated by those skiliea in in ■ 

aenes into commer- 

Of particular relevance are the recently '='«^°"^,^,'^^f sunflower 

^-ra^K^^^ 

^ ^^^^^^^^^ breeding has been .ell- 

the invention has been mapped to four diffe^nt k>c. o-J^^^^'^^^'-^^ a RFLP m'^rlTfor tUs linked to 
^ Biochem.. supplement HE p. 291. act ^^^.^^^ .^^^^^^^^ acid. The nucleic acid 

these mapped loci. More preferably these ^^^J '^'^^^^^ gene from variant 

fragment of the invention can also be used to isolate ^« f ^ ^^,^3^ 'genes will reve^ 

(including mutant) soybeans with altered steanc ^^'^ '^^^^"^^^^^^^^^^^^ designed 
nucleotide differences from ^^^'^f ^^^^^^^^ Z variaSon in stearic and oleic 

^ :r Cn^^rbaTd " "ie^ceTrar:: . ....o. may be used as molecular 

sequence of a soybean seed -aroy.™ 
and'^t^L reading frame that includes t^^^^^^^^ tra^o ^c^eJ 

ao desaturase. The nucleotide sequence -ads fm^^ ?5^990)TnCSed reference herein. Nucleotide 1 
defined by the Commissioner. 1114 OG 29 (May 15. 1990) .^^^ y ^^^^^^^ 

is the first nucleotide of the cDNA insert f ^^^^^^ encods the soybean seed stearoyl-ACP 

the last nucleotide of the cDNA insert of plasm d P'f ^1 J'.^ f 166 to 168 are the 

desaturase. Nucleotides 70 to 72 are ^-^f P^^^'^^^^f"/^^^^^^^^ are the termination 

55 codon for the N-terminal amino acid of the P"" f ""t^'J'J,^^^^^^^^^ i246 to 2243 are the 3" 

codon, nucleotides 1 to ^^^^^^^^^^^^^^ seed stearoyl-ACP 

ss^c^Nr™ - ^) - 
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<;fo id NO-3 represents the N-terminal sequence of the purified soybean seed 
3- non-coding sequence. S^^^'D NO^ ^^^^^^^^^^^ degenerate coding sequence for arr^mo acds 5 

l^^'^^e TsE^^o m^^^^ - complementary mixture of degenerate 

oligonucleotides to SEQ ID N0:4 foibwina EXAMPLES, in which all parts and percentages 

The present invention is further defined .n the ; ^^^^^^ ^e understood that these 
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EXAMPLE 1 
ISOLAIlOh^^ 

PREPARATION OF r9.10-^H]-S TEAROYL-ACP 
Purification of Acy l Carrier Protein (ACP) from E. coli 



20 



25 



30 



35 



TO frozen E. cel. paste. (0.5 .g of .2 P--^-^^^^^^ 
obtained from Gran Processing Corp. "^"^JJ^Jf J^™^'^ ^ suspension was thawed in a water 
glycine, and 0.25 M in EDTA. Ten mL of 1 M '^^^^^^^^ ^„ ^ bath, made to 10 mM in 2- 
Sath at 50 • C. AS the suspension approached 37 ^ « -n,e suspension was stirred for 

mercaptoethanol and 20 mg of DNAse and 50 '^^^^"^'^^^ j^^t^^^ was adjusted to 1 L and the 
2 h. then sheared by three 20 second bursts .n ^^^^""S^^'J^'^^J"^^^ centrifuged at 90.000xg for 2 
.ixiure was centrifuged at 24.000xg for 30 n^n. ^^^^^ ^^^f ^^^^S?^^^^^^^^^^^^ see below) and the 
h. The resultant high-speed pelle was saved °r f^^^^^ "^^ ^^J,^ ^hen made to 50«/. in 2- 
supernatant was adjusted to pH 6.1 by the °* ^ o'C. The resulting precipitate was 

propanol by the slow addition of cold 2-propanol ^ The resultant supernatant was 

allowed to settle for 2 h and then i'^JJ"^^^^^^^^^^^ of DEAE-Sephacel<. which 

adjusted to pH 6.8 with KOH and «PPf ^ ",U^n to a 4^^^^^^ ^.^^ p„ 3 3 

had been equilibrated in 10 mM MES. P" ^f.^^^^^^"'"^ Twenty mL fractions were collected 

eluted with 1 L of a gradient of LiCI from 0 to 1 T M ^^^^^^ of every second fraction to a lane of a 
and the location of eluted ACP was f f -"'"^f^^^ ^^^^^^^^ Ip^GE^FrJous eluting at about 0.7 M UCl 

:srnr:::ran^^^^ 

Purification of Acvl-ACP Synthase 

Membrane pellets resulting from ^ l^f^^^^^ %7oZ T^^^ 
380 mL of 50 mM Tris-CI. pH ^^^^^^^^^^^^^^^^ Tris-C. pH 8.0. to a protein 

resultant supernatant was discarded and the Pe"ets resuspena 

concentration of 12 mg/mL The membrane P«";?"^2n af^^^^^^^ The protein in the 

MgCb. and stirred at O'C for 20 mm ^f<^^^J^^'^;f^^ J S in 50 mM Tris-CI. pH 8.0 and. then, 
resultant supernatant was diluted to 5 '^f "^^^ J'^^ ^ ^ ° , ^^ng with an equimolar amount of 

made to 5 mM ATP by the additoon of solid '^TP (^joj^^^^^^^ temperature reached 53'C and was 
NaHCOs. The solution was warmed m J S^-C bathjtiH^^^ he sol"«°" ^^P'^'^ °" 

then maintained at between 53-C and 55-C ^^'^"^nU^r^ ZVe^t treatment step was loaded directly 
and centrifuged at 15.000xg for 15 mm. The ^"P«^^^^^"*;^^^^^ TrLci. pH 8.0. and 2% 

, onto a column of 7 mL Blue Sapharose® 4B which had been e^^^^^^^^ P^^ ^3 ^ ^^^1 

Triton X-100. The column was washed ^^^^^^^ buffer. Active fractions were 

in the same buffer and ^^-f^^^^^^^^^^^ to 3 mL settled-volume 

assayed for the synthesis of acyl-ACP. 'lescrroea oeiow. hydroxylapatite was 

of hydroxylapatite equilibrated in 50 mM T;;s-CI. pH 8.0 2A Tn^^^^^^ ^.^^^ 

s collected by centrifugation. washed twice with f ^ Jj^^^^^^^^^ 7.5. 2% Triton X-IOO®. The 
^^^T^^V^^^TT^^ with^ 30 .0 membrane filtration 
concentrator (Amicon) to 1.5 mL. 
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Synthesis of [9 m-^HI-Stearoyl-ACP 

■ ^/l -»AR iin was mixed with a solution of [9.10-3H]stearate 
A solution of stearic add in m^hanol (1 mM. ^^^^[^^^''^^p preparation described above (1-15 
(Amersham) containing 31.6 uCi of 3H and dned n a 9^as^ -al^^e^A^ ^^P^P^^ ^^^^^^ 
mL. 32 nmoles) was added along with 0.1 mL of 0^ m a ^g^^^ion was mixed 

Tnd 0 2 mL of 13% Triton X-100® In 0.5 M Tr.s-CI. P"/;° J*^^ ° ' 1 h at37«C, a 10 .^L aliquot 

rroughiy and 0.3 mL of the -Vi-ACP ^^^^^^^^^^^^^^ ^^he^ expensively with chloro- 

was taken and dried on a small J^P^^^^^^^ ^^^^^^ the disc was taken as a measure of 
form:methanol:acetic acid (8 2: . v:v:v) and 7^;^^^ ^he reaction did not proceed furthenn 

stearoyl-ACP. At 1 h about 67% of frie ACP had been cons ^^^^^^ ^^^^^^ ^^E- 

the next 2 h. The reaction mix was diluted 1 ^ J ^0 mM ^.g^ed in sequence with 5 mL of 20 

Sephacel® column equilibrated in the T.^"^^^^^^ and eluted with 0.5 M LiCl in 20 mM 

mM Tris-Cl. pH 8.0. 5 mL of 80'/o 2-P^0P^"°^" ^0 m^^^^^^^^ r3 mL column of octyl-sepharose® CL-4B 
Tris-CI pH 8.0. The column eluate was passed d. ectly o™o » j m ^.^^ ^^^^^ 2- 

was washed with 10 mL of ^0 potass^m^h^^^^^^^^^^ ^^^^ 3h 

pr-iiJbrorsrT.^^^^^^^^^^ 

[3H]stearoyl-ACP at 0.9 mCi/umole. 

PREPARATION^ 

Synthesis of N-hexadecyli odoacetamide 

. . . o t nu rnnlfid to 4'C. and 2.83 mmoles 
1-Hexadecylamine (3.67 mmole) was dis^^l^ed ;nJ4^«;^^^^^^ solution. The solution was 

of iodoacetic anhydride in 11.3 mL of ^^^^^^^^ ^f^^^^ZLe was diluted to about 50 mL with 
warmed to room temperature and held ^^^'^J^' solution and then 2 times with 
CHaCla and washed 3 times (25 mL) w;^h sat,^^^^^ s^^^^^^^ ^ndervacuum and passed through 25 mL of 
water. The volume of the solution was reduced 0 ^bou^_^rnL u ^^^^^^ ^.^ ^.^^^^ ^^0 mg 

silica in diethyl ether. The eluate "^^i*;/" ^3% The 300 MHz NMR spectra of the 

, (2 03 mmoles) of the N-hexadecyliodoacetamide (71 .8 A yield), 
product was consistent with the expected structure. 

gynthPsis nf N-Hexadecylacetamido-S-ACP 

. . coil ACP p^epared as above (10 mg in 2 - of 50 mM^^^^^^^^ 

mMDfnor2h. The solution was made to 10ATCA.h^^^^^^ redissolved in 3 mL of 50 mM 

resultant pellet was washed ^ x 2 ^^^-^^J l^^^^^ ^^uLd to 7.5 with 1 M KOH and 3 mL of N- 
potassium phosphate buffer. The pH of the ACP solution ^^^^ precipitate of the N-hex- 

Sexadecyliodoacetamide (3 mM ^'P^^""' J^^^^^ 
« adecyliodoacetamide -f ^^^f J^^^J^^^^ SeS ^p oximately 80% conversion to an ACP 
for 6 h. SDS-PAGE on 20% ^"^^^''^^J'f;J^\Tr^ 

species of intermediate mobility between *es^a^ g^ e^^^^ ^^^^.^^^ ^^^b with 

hexadecyliodoacetamide was removed from the [^f "o"J"'^"y 
genS Sxing to avoid precipitation of the protem at the interface. 

" ....pnnn nf N-HexadecylacetHmi--'-^'' CNBr-activated Sepharose^ 

cyanogen bromide-activated Sepharose. 4B (Pharmaci. 2 g) was suspended j;^-;^^^^^'^ 
extensively washed by filtration and resuspension -n l rnM HQ ^ ^ 2 M 

50 8.3. The N-hexadecylacetamido-S-ACP P^^P^f sfr^o^^ Ib (about 5 mL) was added to the 

NaHC03. pH 8.3. The filtered cyanogen f:^^^^^''^'^^ volume of 10 mL with the 0.1 M 
N-hexadecylacetamldo-S-ACP solujon. he ^^^"^""^ ^l^^ ^^r 6 h. Protein remaining in solution 
NaHCOa. pH 8.3. and mixed by ^"'^'"'^^^''^^^^^^ 

(Bradford assay) indicated approx-mately 85% ^'"^'"a^^^^^^^^^^ m ethanolamine adjusted to pH 8.5 

55 washed once with the 0.1 M NaHCOa pH 8.3 a d^^^^^^^^^^^^ ^^^^ ,y centrifugation and re- 

with HCl. The suspension was allowed to s^and at 4 o ove^ g ^ ^^^^^^ 3 3 ^ 5 ^ ,„ 
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mM bis-tris propane-CI (BTP-CI), pH 7.2. before use. 
STEAROYL-ACP DESATURASE ASSAY 

5 Stearoyl-ACP desaturase was assayed as described by McKeon at al. [(1982) J. Biol. Chem. 
257:12141-12147] except for using [9,10-3H]-stearoyl-ACP. Use of the tritiated substrate allowed assaying 
the enzyme activity by release of tritium as water, although the assay based on the tritium release 
underestimates desaturation by a factor of approximately 4 relative to that observed using i*C-stearoyl-ACP 
by the method of McKeon et al. [(1982) J. Biol. Chem 257:12141-12147], apparently because not all tritium 

10 is at carbons 9 and 10. Nevertheless, this modification makes the enzyme assay more sensitive, faster and 
more reliable. The reaction mix consisted of enzyme in 25 uL of 230 ug/mL bovine serum albumin (Sigma). 
49 ug/mL catalase (Sigma), 0.75 mM NADPH, 7.25 uM spinach ferredoxin, and 0.35 uM spinach 
ferredoxin:NADPH^ oxidoreductase. 50 mM Pipes, pH 6.0, and 1 uM [9,10-3H]-stearoyl-ACP (0.9 
mCi/umole). All reagents, except for the Pipes buffer, labeled substrate and enzyme extract, were 

75 preincubated in a volume of 7.25 uL at pH 8.0 at room temperature for 10 min before adding 12.75 uL the 
Pipes buffer and labeled substrate stocks. The desaturase reaction was usually terminated after 5 min by 
the addition of 400 uL 10% trichloroacetic acid and 50 ul of 10 mg/mL bovine serum albumin. After 5 min 
on ice, the protein precipitate was removed by centrifugation at 13,000xg for 5 min. An aliquot of 425 uL 
was removed from the resultant supernatant and extracted twice with 2 mL of hexane. An aliquot of 375 uL 

20 of the aqueous phase following the second hexane extraction was added to 5 mL of ScintiVerse® Bio HP 
(Fisher) scintillation fluid and used to determine radioactivity released as tritium, 

PURIFICATION OF SOYBEAN SEED STEAROYL-ACP DESATURASE 

25 Developing soybean seeds, ca. 20-25 days after flowering, were harvested and stored at -80'C until 
use. 300 g of the seeds were resuspended In 600 mL of 50 mM BTP-CI, pH 7.2, and 5 mM dithiothreitol 
(DTT) in a Waring Blender. The seeds were allowed to thaw for a few minutes at room temperature to 4'C 
and all of the purification steps were carried out at 4*C unless otherwise noted. The seeds were 
homogenized in the blender three times for 30 s each and the homogenate was centrifuged at 14,000xg for 

30 20 min. The resultant supernatant was centrifuged at lOCOOOxg for 1 h. The resultant high-speed 
supernatant was applied, at a flow-rate of 5 mL/min to a 2.5 x 20 cm Blue Sepharose® column equilibrated 
in 10 mM BTP-CI, pH 7.2, 0.5 mM DTT. Following a wash with 2 column volumes of 10 mM BTP-CI, pH 
7.2. 0.5 mM DTT, the bound proteins were eluted in the same buffer containing 1 M NaCI. The eluting 
protein peak, which was detected by absorbance at 280 nm. was collected and precipitated with 80% 

35 ammonium sulfate. Following collection of the precipitate by centrifugation at lO.OOOxg for 20 min, its 
resuspension in 10 mM potassium phosphate, pH 7.2. 0,5 mM DTT, overnight dialysis in the same buffer 
precipitate, and clarification through a 0.45 micron filter, it was applied to a 10 mm x 25 cm Wide-pore^" 
PEI (NH2) anion-exchange column (Baker) at 3 mUmin thoroughly equilibrated in buffer A (10 mM 
potassium phosphate, pH 7.2). After washing the column in buffer A until no protein was eluted, the column 

40 was subjected to elution by a gradient from buffer A at 0 min to 0.25 M potassium phosphate (pH 7.2) at 66 
min at a flow rate of 3 mL/min. Three mL fractions were collected. The desaturase activity eluted in 
fractions 17-25 (the activity peak eluted at ca. 50 mM potassium phosphate). The pooled fractions were 
made to 60 mL with buffer A and applied at 1 miymin to a 1 x 5.5 cm alkyl-ACP column equilibrated in 
buffer A containing 0.5 mM DTT. After washing the bound protein with the start buffer until no protein was 

45 eluted, the bound protein was eluted by a gradient from buffer A containing 0.5 mM DTT at 0 min to 0.5 M 
potassium phosphate, pH 7.2, 0.5 mM DTT at 60 min and 1 M potassium phosphate, pH 7.2, 0.5 mM DTT. 
Four mL fractions were collected. Fractions 15-23, which contained the enzyme with the highest specific 
activity, were pooled and concentrated to 3 mL by a 30 kD Centricon® concentrator (Millipore) and desalted 
in a small column of G-25 Sephadex® equilibrated with 25 mM bis-Tris-Cl, pH 6.7. The desalted sample 

50 was applied at 1 mUmin to a chromatofocussing Mono P HR 5/20 (Pharmacia) column equilibrated with 25 
mM bis-Tris-Cl, pH 6.7, washed with a column volume of the same buffer, and eluted with 1:10 dilution of 
Polybuffer 74 (Pharmacia) made to pH 5.0 with HCl. Desaturase activity eluted in two peaks: one in fraction 
30 corresponding to a pi of ca. 6.0 and the other in fraction 35, corresponding to a pi of ca. 5.7. The protein 
in the two peaks were essentially composed of ca. 38 kD polypeptide. The first peak had a higher enzyme 

55 specific activity and was used for further characterization as well as for further purification on reverse-phase 
chromatography. 

Mono P fractions containing the first peak of enzyme activity were pooled and applied to a C4 reverse- 
phase HPLC column (Vydac) equilibrated with buffer A (5% acetonitrile, 0.1% trifluoroacetic acid) and 
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eluted at 0.1 mLVmin with a gradient of 25% buffer B (100% acetonitrile. 0.1% trifluoroacetic acid) and 75% 
buffer A at 10 min to 50% buffer B and 50% buffer A at 72.5 min. A single nnajor peak eluted at 41.5% 
buffer B that also ran as a ca. 38 kD protein based on SDS-PAGE. The protein in the peak fraction was 
used to determine the N-termlnal amino acid sequence on a Applied Biosystems 470A Gas Phase 
Sequencer. The PTH amino acids were analysed on Applied Biosystems 120 PTH Amino Acid Analyzer. 

The N-terminal sequence of the ca. 38 kD polypeptide was determined through 16 residues and is 
shown in SEQ ID N0:3. 

CLONING OF SOYBEAN SEED STEARQYL-ACP DESATURASE cDNA 

Based on the N-terminal amino acid sequence of the purified soybean seed stearoyl-ACP desaturase 
(SEQ ID N0:3), amino acids 5 through 16, which are represented by the degenerate coding sequence. SEQ 
ID N0:4, was chosen to design the complementary miKture of degenerate oligonucleotides (SEQ ID N0:5). 

The design took into account the codon bias In representative soybean seed genes encoding Bowman- 
Birk protease inhibitor [Hammond et al. (1984) J. Biol. Chem. 259:9883-9890], glycinin subunit A-2B-1a 
(UtsumI et al. (1987) Agrlc. Biol. Chem. 51:3267-3273], lectin (le-1 )[Vodkin et al. (1983) Cell 34:1023-1031], 
and lipoxygenase-1 [Shibata et al. (1987) J. Biol. Chem. 262:10080-10085]. Rve deoxyinosines were used 
at selected positions of ambiguity. 

A cDNA library was made as follows: Soybean embryos (ca. 50 mg fresh weight each) were removed 
from the pods and frozen in liquid nitrogen. The frozen embryos were ground to a fine powder in the 
presence of liquid nitrogen and then extracted by Polytron homogenization and fractionated to enrich for 
total RNA by the method of Chirgwin et al. [Biochemistry (1979) 18:5294-5299]. The nucleic acid fraction 
was enriched for poly A+ RNA by passing total RNA through an oligo-dT cellulose column and eluting the 
poly A+ RNA by salt as described by Goodman et al. [(1979) Meth. Enzymol. 68:75-90]. cDNA was 
synthesized from the purified poly A+ RNA using cDNA Synthesis System (Bethesda Research Laboratory) 
and the manufacturer's instructions. The resultant double-stranded DNA was methylated by DNA methylase 
(Promega) prior to filling-in its ends with T4 DNA polymerase (Bethesda Research Laboratory) and blunt- 
end ligating to phosphorylated Eco Rl linkers using T4 DNA ligase (Pharmacia). The double-stranded DNA 
was digested with Eco Rl enzyme, separated from excess linkers by passing through a gel filtration column 
(Sepharose CL-4B), and ligated to Lambda ZAP vector (Stratagene) as per manufacturer's instructions, 
□gated DNA was packaged Into phage using Gigapack packaging extract (Stratagene) according to 
manufacturer's instructions. The resultant cDNA library was amplified as per Stratagene's instructions and 
stored at -80 "C. 

Following the Instructions in Lambda ZAP Cloning Kit Manual (Stratagene), the cDNA phage library was 
used to Infect E. coll BB4 cells and plated to yield ca. 80,000 plaques per petrl plate (150 mm diameter). 
Duplicate lifts of the plates were made onto nitrocellulose filters (Schleicher & Schuell). Duplicate lifts from 
five plates were prehybrldized In 25 mL of Hybridization buffer consisting of 6X SSC (0.9 M NaCI, 0.09 M 
sodium citrate, pH 7.0), 5X Denhardt's [0.5 g FIcoll (Type 400, Pharmacia), 0.5 g polyvinylpyrrolidone, 0.5 g 
bovine serum albumin (Fraction V; Sigma)], 1 mM EDTA, 1% SDS, and 100 ug/mL denatured salmon 
sperm DNA (Sigma Chemical Co.) at 45 "C for 10 h. Ten pmol of the hybridization probe (see above) were 
end-labeled in a 52.5 uL reaction mixture containing 50 mM TrIs-CI, pH 7.5, 10 mM MgCk. 0.1 mM 
spermidine-HCI (pH 7.0), 1 mM EDTA (pH 7.0), 5 mM DDT, 200 uCi (66.7 pmoles) of gamma-labeled AV^P 
(New England Nuclear) and 25 units of T4 polynucleotide kinase (New England Biolabs). After incubation at 
37 "C for 45 min, the reaction was terminated by heating at eS'C for 10 min. Labeled probe was separated 
from unincorporated AT^^p by passing the reaction through a Quick-Spin^" (G-25 Sephadex®) column 
(Boehringer Mannheim Biochemicals). The purified labeled probe (1.2 x 10^ dpm/pmole) was added to the 
prehybrldized filters, following their transfer to 10 mL of fresh Hybridization buffer. Following incubation of 
the filters In the presence of the probe for 16 h in a shaker at 48 ''C, the filters were washed in 200 mL of 
Wash buffer (6X SSC, 0.1% SDS) five times for 5 min each at room temperature, and then once at 48*C 
for 5 min. The washed filters were air dried and subjected to autoradiography on Kodak XAR-2 film in the 
presence of intensifying screens (Lightening Plus, DuPont Cronex®) at -80 -C overnight. Six positively- 
hybridizing plaques were subjected to plaque purification as described In Sambrook et al. [(1989) Molecular 
Cloning: A Laboratory Manual, 2nd ed.. Cold Spring Harbor Laboratory Press]. Following the Lambda ZAP 
Cloning Kit Instruction Manual (Stratagene). sequences of the pBluescript vector. Including the cDNA 
inserts, from each of six purified phages were excised in the presence of a helper phage and the resultant 
phagemids were used to infect E. coii XL-1 Blue cells resulting in double-stranded plasmids, pDSI to pDS6. 
The restriction maps of all six plasmids, though not Identical, showed a common 0.7 kb Bgl It fragment 
found In the desaturase gene (see below). 
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DNA from plasmids pDS1-pDS6 were made by the alkaline lysis miniprep procedure described in 
Sambrook et al. [(1989) Molecular Cloning: A Laboratory Manual, 2nd Ed. Cold Spring Harbor Laboratory 
Press]. The alkali-denatured double-stranded DNAs were sequenced using Sequenase® T7 DNA poly- 
merase (US Biochemical Corp.) and the manufacturer's instructions. The sequence of the cDNA insert in 
5 plasmid pDSI is shown In SEQ ID N0:1. 

EXAMPLE 2 

EXPRESSION OF SOYBEAN SEED STEAROYL-ACP DESATURASE IN E. COLI 

70 

Construction of Glutathione-S-Transferase: Stearoyl-ACP Desaturase Fusion Protein 

Plasmid pDSI was linearized with Hind til enzyme, Its ends filled-in with Klenow fragment (Bethesda 
Research Laboratory) In the presence of 50 uM each of all four deoxynucleotide triphosphates as per 

75 manufacturer's instructions, and extracted with phenohchloroform (1:1). Phosphorylated Eco Rl linkers (New 
England Blolabs) were ligated to the DNA using T4 DNA ligase (New England Biolabs). Following partial 
digestion with Bgl II enzyme and complete digestion with excess Eco Rl enzyme, the DNA was run on an 
agarose gel and stained with ethidium bromide. The 2.1 kb DNA fragment resulting from a partial Bgl II and 
Eco Rl digestion was cut out of the gel, purified using USBioclean™ (US Biochemicals), and ligated to Bam 

20 HI and Eco Rl cleaved vector pGEX2T [Pharmacia; see Smith et al. (1988) Gene 67:31] using T4 DNA 
ligase (New England Biolabs). The ligated mixture of DNAs were used to transform E. coli XL-1 blue cells 
(Stratagene). Transformants were picked as ampicillln-resistant cells and the plasmid DNA from several 
transformants analyzed by digestion with Bam HI and Eco Rl double restriction digest, as described by 
Sambrook et al. [(1989) Molecular Cloning: A Laboratory Manual, 2nd Ed. Cold Spring Harbor Laboratory 

25 Press]. Plasmid DNA from one transformant. called pGEXB, showed the restriction pattern expected from 
the correct fusion. The double-stranded plasmid pQEXB was purified and sequenced to confirm the correct 
fusion by the Sequenase kit (US Biochemical Corp.). The fusion protein consists of a 28 kD glutathione-S- 
transferase protein fused at its C-terminus to the desaturase precursor protein at Ser at residue -10 from the 
N-terminus of the mature enzyme (Arg, +1) (SEQ ID N0:1). Thus, it includes ten amino acids from the 

30 transit peptide sequence in addition to the mature protein. 

Inducible Expression of the Glutathione-S-Transferase-Stearoyl-ACP Desaturase Fusion Protein 

Five mL precultures of plasmids pGEXB and pGEX2T. which were grown overnight at 37'C in LB 
35 medium [Sambrook et al. (1989) Molecular Cloning: A Laboratory Manual, 2nd Ed. Cold Spring Harbor 
Laboratory Press] containing 100 ug/mL ampicillln, were diluted 1:10 in fresh LB medium containing 100 
ug/mL ampicillin and continued to grow on a shaker at 37 •C for another 90 min before adding 
isopropylthio-^-D-galactoslde and ferric chloride to final concentrations of 0.3 mM and 50 uM, respectively. 
After an additional 3 h on a shaker at 37 "C, the cultures were harvested by centrifugation at 4,000xg for 10 
40 min at 4'C. The cells were resuspended in one-tenth of the culture volume of freshly-made and ice-cold 
Extraction buffer (20 mM sodium phosphate, pH 8.0, 150 mM NaCI, 5 mM EDTA and 0.2 mM phenyl- 
methylsulfonyl fluoride) and re-centrifuged as above. The resultant cells were resuspended In 1/50 vol of the 
culture in Extraction buffer and sonicated for three ten-second bursts. The sonicated extracts were made to 
1% in Triton X-100 and centrifuged at S.OOOxg for 1 min in Eppendorf Micro Centrifuge (Brinkmann 
45 Instruments) to remove the cellular debris. The supernatant was poured into a fresh tube and used for 
enzyme assays. SDS-PAGE analysis and purification of the fusion protein. 

Five ill aliquots of the extracts were assayed for stearoyl-ACP desaturase activity in a 1 min reaction, 
as described in Example I. The activities [net pmol of stearoyl-ACP desaturated per min per mL of extract; 
the blank (no desaturase enzyme) activity was 15 pmol/min/ml] are shown below: 

50 



Reaction mixture 


Net pmol/min/mL 


E. coll (pGEX2T) 


0 


E. coli (pGEXB) 


399 


E. coli (pGEXB) - NADPH 


0 


E. coli (pGEXB) - ferredoxin 


0 


E. coli (pGEXB) - ferredoxIn-NADPH reductase 


3 
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These results show that the desaturase enzyme activity is present in the extract of § coli cells 
containing pGEXB but not in that of cells containing the control plasmid pGEX2T. Furthermore, this acfvity 
was dependent on an exogenous electron donor. 

Prcrteins in extracts of E. coli cells harboring plasmids pGEX2T or pGEXB were resolved by SDS- 
PAGE transferred onto lmr^iob"ibn®-P (Millipore) and cross-reacted with mouse antibody made against 
pumied soybean stearoyl-ACP desaturase. as described by Sambrook et al. [(1989) Clon.ng|^ A 

Laboratory Manual, 2nd Ed. Cold Spring Harbor Laboratory Press]. The resultant Western blot showed tha 
pGEXB encodes for ca. 64 kD GST-stearoyl-ACP desaturase fusion polypeptide, although sonne lower 
molecular-weight cross-reacting polypeptides can also be observed, which may represent either a degrada- 
«o or o" lete synthesis of'the fusion protein. It is not known whether the GST-desaturase fusion Pro^^^^^ 
is enzymatically active, since the activity observed may be due to the incomplete fusion by the peptides 
seen Tre The fusion polypeptide was not present in extracts of cells harbonng the control plasmid 
(PQEX2T) nor in extracts of cells harboring pGEXB that were not induced by isopropylthio-£-D-galactos,de. 

Purification of the Glutathione-S-Transferase-Stea rovl-ACP Desaturase Fusion Protein 

The GST-desaturase fusion protein was purified in a one step glutathione-agarose affinity chromatog- 
raphy under non-denaturing conditions, following the procedure of Smith et al. [Gene (1988) 67:31]. For this 
the bacterial cell extract was mixed with 1 mL glutathione-agarose (sulfur-linkage, Sigma), equil^ra^ed wi h 
20 mM sodium phosphate, pH 8.0, 150 mM NaCI. for 10 min at room ternperature. The beads were 
collected by cenWfugation at lOOOxg for 1 min. and washed three times with 1 mL of 20 od-urn 
phosphate. pH 8.0. 150 mM NaCl (each time the beads were collected f ^^^^^ 

above). The fusion protein was eluted with 5 mM reduced glutathione (Sigma) in 50 mM Tr.s-CI. pH 8_0_ 
The proteins in the eluted fraction were analyzed by SDS-PAGE and consisted of ^stly pure ca^ 64 kD 
GST-desaturase polypeptide. 28 kD GST and a trace of <=-38 JD desaturase poW^^^^^^^^ 
polypeptide was cleaved in the presence of thrombin, as described by Smith et al. [Gene (1988) 67.31]. 

Construction of /8-Galactosidase-Stearoyl-ACP Desaturase Fusion Protein 

Plasmid pDSl DNA was digested with Ssp I and Pvu I enzymes and the digested DNA fragments were 
resolved by electrophoresis in agarose. TTie blunt-ended 2.3 kb Ssp I fragment was cut out of the agarose 
PVU 1 cleav^^^ a contaminating 2.3 kb Ssp I fragment), purified by ^SBIoclean™ (US Biochemical Corp.)^ 
and ligated to vector plasmid pBluescript SK (-) (Stratagene) that had previously "een fiHed-in with Klenow 
fragment (Bethesda Research Uboratory) following linearization with Not I enzyme. The hgated DNAs were 
transfom^ed into competent E. coli XL-I blue cells. Plasmid DNA from several amp.c.ll.n-res.stant fransfor- 
mants were analysed by restriction digestion. One plasmid, called pNS2. ^'^I'^'^/i^ ^^.^^^J, ^"^^^^^^^^^ 
map. This plasmid is expected to encode a ca. 42 kD fusion protein cons.shng of 4 kD N-termmal o^ ^- 
galactosidase fused at its C-terminus to isoleucine at residue +10 from the N-terminus of the ma ure 
desaturase protein (Arg, +1) (SEQ ID N0:1). Thus, it includes all but the first 10 ammo acids of the mature 
protein. Nucleotide sequencing has not been performed on pNS2 to confirm correct fusion. 

Five mL of preculture of E. coli cells harboring plasmid pNS2 grown overnight in LB medium containing 
100 ug/mL ampicillin was addedlO 50 mL of fresh LB medium with 100 ug/mL ampicillin. After addihonal 1 
h of growth at 37- C in a shaker. isopropylthio-£-D-galactoside and feme chloride were added to final 
concentrations of 0.3 mM and 50 uM. respectively. After another 2 h on a shaker at 37-C the cu ture was 
harvested by centrifugation at 4.00Qxg for 10 min at 4-C. The cells were resuspended in I mL of freshly- 
made and ice-cold TEP buffer (100 mM Tris-CI. pH 7.5. 10 mM EDTA and 0.1 mM phenylmethylsulfonyl 
fluoride) and recentrifuged as above. The cells were resuspended in 1 mL of TEP buffer and sonicated for 
three ten-second bursts. The sonicates were made to 1% in Triton X-100, allowed to stand in ice for 5 mm. 
and centrifuged at 8.000xg for 1 min in an Eppendorf Micro Centrifuge (Brinkmann Instruments) to remove 
the cellular debris. The supernatant was poured into a fresh tube and used for enzyme assays and SDS- 
PAGE snslvsis 

A 1 uL aliquot of the extract of E. coli cells containing plasmid pNS2 was assayed for stearoyl-ACP 
desaturase activity in a 5 min reaction, as described above. The extract showed activity of 288 pmol of 
stearoyl-ACP desaturated per min per ml of the extract [The blank (no desaturase enzyme) activity was 15 

^'"°Proteins'''in the extract of E. coH cells harboring plasmids pNS2 were resolved by SDS-PAGE 
transferred onto lmmobilon®-P (Millipore) and cross-reacted with mouse antibody made against purified 
soybean stearoyl-ACP desaturase. as described in Sambrook et al. [(1989) Molecular Cloning: A Laboratory 
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5 



10 



EXAMPLE 3 

USE OF SOYBEAN SEEDJIEAROYL,^^ 

P.as.id PDS1 was .inearized by c^'^es^on with rest«^^ 
described In Sambrook et al. [(1989) Molecular "^^'^p^^.^S^^^^ Research Laboratories 

Laboratory Press] and labeled with -P "^'^^ ^^f^^"^^^^^^ probe was used to probe a 

under coridltions recorr^mended by t^'\;;"f""*,^f Mar^ual, 2nd Ed. Cold Spring Harbor 

southern blot [Sambrook et al.. (1989) Molecular ^'on^' ^^^'"'^^^^^^ Bonus) and Glycine soja - 

Uboratory Press] containing genomic ^om sov^^^^^^^^^ ^^^^^ u^andard 

(PI81762)1. digested with one of ««^«:fV^'rXnTnn 7^^810^ 2nd Ed. Cold Spring Harbor 

conditions [Sambrook et al.. (1989) ^^'^^^^^''^ m^Z^^^^^^^ of hybridization (polymorphisms) 
Laboratory Press] autoradiograms were obtained and PJ"; ^^ gg^e probe was then 

were identified in digests performed with - -^"l^^^J I described by Helentjaris et 

used to map the polymorphic pDSi ^ ^p^^XST^^^^^^ as described above, to 

, al. [(1986) Theor. Appl. Genet. 72:76 -7691. Plasmidpubi p^ ^^^^ 
Southern blots of Eco Rl or Pst I digested 9e"0'"'^ ^NAs interpreted as resulting 

a G. max Bonus x G. soja P181762 cross. The ''^"^^^""J'^!,^^^^^ or both (a heterozygote). The 

from thTinherltance of either paternal f ""^j,;;. ^^S^^,^^^ "^^P^^^^ l"-^"^^^ 

resulting data were subjected to genetic ^"^f "^J^^^^^ data for 436 anonymous RFLP 

, (1987) Genomics 1: 174-181] In ^"^"^^^^^^^^^^^^^^ 14E p. 291. abstract R153]. we 

r:srrr ^^^^^^^ - '-^^ '^^^ '"'^ " 

,0 loss of seed stearoyl-ACP desaturase enzyme. 
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SEQUENCE LISTING 
GENERAL INFORMATION: 

(i) APPLICANT: Hitz, William D. 

Yadav, Narendra S 

(ii) TITLE OF THE INVENTION: Nucleotide 

Sequence of SoybeanStearoyl-ACP 
Desaturase cDNA 

(iii) NUMBER OF SEQUENCES: 5 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: E. I- du Pont de 

Nemours and Company 

(B) STREET: 1007 Market Street 

(C) CITY: Wilmington 

(D) STATE: Delaware 

(E) COUNTRY: USA 

(F) ZIP: 19898 

|v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: DISKETTE, 3.50 

inch, 1.0 MB 

(B) COMPUTER: Apple Macintosh 

(C) OPERATING SYSTEM: 

(D) SOFTWARE: 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 07/529,049 

(B) FILING DATE: 25-MAY-1990 

(C) CLASSIFICATION: 
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(vii) ATTORNEY /AGENT INFORMATION; 

(A) NAME: Bruce W. Morrissey 

(B) REGISTRATION NUMBER: 30,663 

(C) REFERENCE/DOCKET NUMBER: BB-1022 

(viii) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: (302) 892-4927 

(B) TELEFAX: (302) 892-7949 

(C) TELEX: 835420 



INFORMATION FOR SEQ ID N0:1: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2243 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: CDNA to iriRNA 

(iii) HYPOTHETICAL: No 

(iv) ANTISENSE: No 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Glycine max 

(B) STRAIN: Cultivar Wye 

(D) DEVELOPMENTAL STAGE: Developing 

(vii) IMMEDIATE SOURCE: 

(A) LIBRARY: cDNA to mRNA 

(B) CLONE: pDSl 
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10 



15 



25 



(ix) FEATURE: 

(A) NAME/KEY: 

(i) 5' non-^coding sequence 

(ii) Putative translation 
initiation codon 

(iii) Putative transit 
peptide coding sequence 

(iv) Mature protein coding 
sequence 

(v) Translation termination 
codon 

(vi) 3' non-coding sequence 
20 (B) LOCATION: 

(i) nucleotides 1 through 69 

(ii) nucleotides 70 through 72 

(iii) nucleotides 70 through 165 

(iv) nucleotides 1G6 through 
1242 

(v) nucleotides 1243 through 
1245 

(vi) nucleotides 1246 through 
2243 

(C) IDENTIFICATION METHOD: 

(i) deduced by proximity to 
ii) below 

(ii) similarity of the context 
of the methionine codon in 
the open reading frame to 
translation initiation 
codons of other plastid 
transit peptides 

(iii) deduced by proximity to 
ii) above and iv) below 
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(iv) experimental determination 
of N-terminal amino acid 
sequence and subunit size 
of purified soybean seed 
stearoyl-ACP desaturase 

(v) The translation 
termination codon ends 
the open reading frame for 
a protein of the expected 
size 

(vi) established by proximity 
to v) above 

(D) OTHER INFORMATION: 

Extracts of E. coli expressing the 
mature protein as a fusion protein 
show stearoyl-ACP desaturase 
activity and produce a protein 
that cross-reacts to stearoyl-ACP 
desaturase antibody 

(X) PUBLICATION INFORMATION: Sequence not 

published. 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:1: 

CTTCTACATT ACTCTCTCTT CTCCTAAAAA TTTCTAATGC 40 

TTCCATTGCT TCATCTGACT CACTCATCA ATG GCT CTG AGA CTG AAC OCT 90 

Met Ala Leu Arg Leu Asn Pro 
-32 -30 

ATC CCC ACC CAA ACC TTC TCC CTC CCC CAA ATG CCC AGC CTC AGA 135 
lie Pro Thr Gin Thr Phe Ser Leu Pro Gin Met Pro Ser Leu Arg 
-25 -20 -15 

TCT CCC CGC TTC CGC ATG GCT TCC ACC CTC CGC TCC GGT TCC AAA 180 
Ser Pro Arg Phe Arg Met Ala Ser Thr Leu Arg Ser Gly Ser Lys 
-10 -5 15 
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GAG GTT GAA AAT ATT AAG AAG CCA TTC ACT CCT CCC AGA GAA GTG 
Glu Val Glu Asn He Lys Lys Pro Phe Thr Pro Pro Arg Glu Val 
10 15 20 



225 



10 



15 



CAT GTT CAA GTA ACC CAC TCT ATG CCT CCC CAG AAG ATT GAG ATT 270 
His Val Gin val Thr His Ser Met Pro Pro Gin Lys He Glu He 
25 30 35 

TTC AAA TCT TTG GAG GAT TGG GCT GAC CAG AAC ATC TTG ACT CAT 315 
Phe Lys Ser Leu Glu Asp Trp Ala Asp Gin Asn He Leu Thr His 
40 45 50 

CTT AAA CCT GTA GAA AAA TGT TGG CAA CCA CAG GAT TTT TTA CCC 360 
Leu Lys Pro Val Glu Lys Cys Trp Gin Pro Gin Asp Phe Leu Pro 
55 60 65 

GAC CCC TCC TCA GAT GGA TTT GAA GAG CAA GTG AAG GAA CTG AGA 405 
ASP Pro Ser Ser Asp Gly Phe Glu Glu Gin Val Lys Glu Leu Arg 
^ 70 "75 80 



20 



GAG AGA GCA AAG GAG ATT CCA GAT GAT TAC TTT GTT GTT CTT GTC 
Glu Arg Ala Lys Glu He Pro Asp Asp Tyr Phe Val Val Leu Val 
85 90 95 



25 



30 



35 



40 



GTT GAC ATG AAA CAA ATT GAG AAG ACA ATT CAG TAC CTT ATT GGG 
Val ASP Met Lys Gin He Glu Lys Thr He Gin Tyr Leu He Gly 
160 165 170 



450 



GGA GAC ATG ATC ACA GAG GAA GCT CTG CCT ACT TAC CAA ACT ATG 495 
Gly Asp Met He Thr Glu Glu Ala Leu Pro Thr Tyr Gin Thr Met 
95 100 HO 

TTA AAT ACT TTG GAT GGA GTT CGT GAT GAA ACA GGT GCC AGC CTT 540 
Leu Asn Thr Leu Asp Gly Val Arg Asp Glu Thr Gly Ala Ser Leu 
115 120 125 

ACT TCC TGG GCA ATT TGG ACA AGG GCA TGG ACT GCT GAA GAA AAC 585 
Thr Ser Trp Ala He Trp Thr Arg Ala Trp Thr Ala Glu Glu Asn 
130 135 140 

AGA CAC GGT GAT CTT CTT AAC AAA TAT CTG TAC TTG AGT GGA CGA 630 
Arg His Gly Asp Leu Leu Asn Lys Tyr Leu Tyr Leu Ser Gly Arg 
145 150 155 



45 



50 



675 



720 



TCT GGG ATG GAT CCT CGG ACC GAG AAC AGC CCC TAC CTT GGT TTC 
Ser Gly Met Asp Pro Arg Thr Glu Asn Ser Pro Tyr Leu Gly Phe 
175 180 185 

ATT TAC ACT TCA TTT CAA GAG AGG GCA ACC TTC ATA TCC CAC GGA 765 
He Tyr Thr Ser Phe Gin Glu Arg Ala Thr Phe He Ser His Gly 
190 195 200 

AAC ACG GCC AGG CTT GCG AAG GAG CAT GGT GAC ATA AAA TTG GCA 810 
Asn Thr Ala Arg Leu Ala Lys Glu His Gly Asp He Lys Leu Ala 
205 210 215 



55 
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=0 .TT 3CC »T ^ -5 C- «C «0 .CT 

cm lie Cys Gly Met He Ala Ser Asp eiu y ^ 
220 ^''^ 

— aar PTG TTT GAG GTT GAT CCT GAT 900 
GCA TAC ACA AAG ATA GTG GAA AAG CTG TTT GAG 
Ala Tyr Thr Lys lie Val Glu Lys l^u 

OST ACA CTX ATG OCA TTT OCC GAC ATG AXG A3G AAG ^G ATT GCT 9« 
Gly Thr Val Met Ala Phe Ala Asp Met Met Arg i^y y 
250 

265 

r-r-n r».r CGC ATT GGG GTC TAC ACT GCA 1035 

GAT AAC TAC TCT GCC GTC GCG CAG CGC ATT GG6 

Asp Asn Tyr Ser Ala Val AU Gin Arg ixe v» y ^90 

.^^ «r» rTT TCA GGT GAG GGA AGA AAG GCT CAG 1125 
GTG GAG CAG CTA ACC GGA CTT TCA GGT B 
val Glu Gin Leu Thr Gly Leu Ser eiy v^^-u x ' 

310 •'^^ 

s s S i 5^ «^ 

325 

Arg Ala Gin Ala Arg Gly Lys Glu Ser Ser Tnr y 
340 •»'^ 



TGG ATT CAT GAC AGG GAA GTA CTA CTC TAAATGCT TGCACCAAGG 
Trp lie His ASP Arg Glu Val Leu Leu 
355 359 

GAGGAGCATG GTGAATCTTC CAGCAATACC ATTCTGAGAA ATGTTGAATA 

GTTGAAAATT CAGTTTGTCA TTTTTATCTT TTTTTTCTCC TGTTTTTTGG 

TCTTATGTTA TATGCCACTG TAAGGTGAAA CAGTTGTTCT TGCATGGTTC 

GCAAGTTAAG CAGTTAGGGG CAGCTGTAGT ATTAGAAATG CTATTTTTTG 

TTTCCCTTTT CTGTGGTAGT GATGTCTGTG GAAGTATAAG TAAACGTTTT 

TTTTTTCTC TGGCAATTTTG ATGATAAAGA AAATTTAGTT CTAAAAACCG 

TCGCACCTTC CCTGAGGCTT CTCTTGTCTG TCGCGAGTGA CCATGGTGAG 

GGTTAGTGTG CTGAACGATG CTCTGAAGAG CATGTACAAT GCTGAGAAAA 

GGGGAAAGCG CCAAGTCATG ATTCGGCCAT CCTCCAAAGT CATTATCAAA 



1260 

1310 

1360 

1410 

1460 

1510 

1560 

1610 

1660 

1710 
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TTCCTTTTGG 


TGATGCAGAA GCACGGATAC ATTGGAGAGT TTGAGTATGT 


1760 


TGATGACCAC 


AGGGCTGGTA AAATCGTGGT TGAATTGAAC GGTAGACTGA 


1810 


ACAAGTGTGG 


GGTTATTAGT CCCCGTTTTG ATGTCGGCGT CAAAGAGATT 


1660 


GAAGGTTGGA 


CTGCTAGGCT TCTCCCCTCA AGACAGTTTG GGTATATTGT 


1910 


ATTGACTACC 


TCTGCCGGCA TCATGGATCA CGAAGAAGCT AGGAGAAAAA 


1960 


ATGTTGGTGG 


TAAGGTACTG GGTTTCTTCT ACTAGAGTTT AATTTCGATT 


2010 


AAGAGGATGT 


CAGGAATTTC AATTGAGATT CATGGATTGT AATGGAGGAT 


2060 


ATGCTAGGCC 


CCTAGTAATA TCAAGCATAG CAGGAGCTGT TTTGTGATGT 


2110 


TCCTTATTTT 


GTTTGCAAAA CCAAGTTGGT AACTATAACT TTTATTTTCT 


2160 


TTTATCATTA 


TTTTTCTTTA TACCAAAATG TACTGGCCAA GTTGTTTTAA 


2210 


ACAGTGAGAA 


CTTTGATTAG AAAAAAAAAA AAA 


2243 



(2) INFORMATION FOR SEQ ID N0:2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 216 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA to mRNA 

(iii) HYPOTHETICAL: No 

(iv) ANTISENSE: No 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Glycine max 

(B) STRAIN: Cultivar Wye 

(D) DEVELOPMENTAL STAGE: Developing 

seeds 
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(vii) IMMEDIATE SOURCE: 

(A) LIBRARY: cDNA to mRNA 

(B) CLONE: pDS4a 

(ix) FEATURE: 

(A) NAME/KEY :3' non-coding sequence 

(B) LOCATION: nucleotides 1 through 

216 

(C) IDENTIFICATION METHOD: Homology of 

clones pDS4a and pDSl 
and similarity of 
sequence in SEQ ID N0:1 
to 3' non-coding 
sequence in SEQ ID N0:1 

(X) PUBLICATION INFORMATION: Sequence not 

published. 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:2: 

GAAATGTTGA ATAGTTGAAA ATTCAGTTTG TCATTTTTAT CTTTTATTTT 50 

TTCTCCTTTT TTGGTCTTTG TTATATGTCA CTGTAAGGTG AAGCAGTTGT 100 

TCTTGCATGG TTCGCAAGTT AAGCAGTTAG GGGCAGCTGT AGTATTAGAA 150 

ATGGTATTTT TTTTTTTGTT TTCGCTTTTC TCTGTGGTAG TGATGTCTGT 200 

CGAAGTATAA GTAAAC 216 

(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 16 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 
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(iii) HYPOTHETICAL: No 

(v) FRAGMENT TYPE: N-terminal fragment 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Glycine max 

(B) STRAIN: Cultivar Wye 

(C) DEVELOPMENTAL STAGE: Developing 

seeds 

(ix) FEATURE: 

(A) NAME/KEY: N-terrainal sequence 

(B) LOCATION: 1 through 16 amino acid 

residues 

(C) IDENTIFICATION METHOD: N-terminal 
amino acid sequencing 

(X) PUBLICATION INFORMATION: Sequence not 

published 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 



Arg Ser Gly Ser Lys Glu Val Glu Asn lie Lys Lys Pro Phe Thr Pro 
15 10 15 



(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 36 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: Other nucleic acid: mixture 
of oligonucleotides 

(iii) HYPOTHETICAL: Yes 

(ix) FEATURE: 

(A) NAME/KEY: Coding sequence 

(B) LOCATION: 1 through 36 bases 

(x) PUBLICATION INFORMATION : Sequence not 
published 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 



AAR GAR GTN GAR AAY ATH AAR AAR CCN TTY ACN CCN 3 
Lys Glu Val Glu Asn lie Lys Lys Pro Phe Thr Pro 
15 10 



(2) INFORMATION FOR SEQ ID NO: 5: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 35 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other nucleic acid: mixture 
of synthetic oligonucleotides 

(ix) FEATURE: 

(C) OTHER INFORMATION: N at positions 
3,6,9, and 27 is deoxyinosine . 

(x) PUBLICATION INFORMATION: Sequence not 
published 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 



GGNGTNAANG GCTTCTTRAT RTTYTCNACN TCCTT 35 



Claims 

1. An isolated nucleic acid fragment connprising a nucleotide sequence encoding the soybean seed 
stearoyl-ACP desaturase corresponding to the nucleotides 1 to 2243 in SEQ ID N0:1, or any soybean 
nucleic acid fragment substantially homologous therewith encoding a functional stearoyl-ACP de- 
saturase. 

2. An isolated nucleic acid fragment of Claim 1 wherein said nucleotide sequence encodes the soybean 
seed stearoyl-ACP desaturase precursor corresponding to nucleotides 70-1245 In SEQ ID N0:1, or any 
soybean nucleic acid fragment substantially homologous therewith encoding a functional stearoyl-ACP 
desaturase precursor. 

3. A nucleic acid fragment of Claim 2, wherein the said nucleotide sequence encodes the mature soybean 
seed stearoyl-ACP desaturase enzyme, corresponding to nucleotides 166 to 1245 in SEQ ID N0:1. 

4. A chimeric gene capable of transforming a soybean plant cell comprising a nucleic acid fragment of 
Claim 1 operably linked to suitable regulatory sequences producing antisense inhibition of soybean 
seed stearoyl-ACP desaturase in the seed. 

5. A chimeric gene capable of transfomning a plant cell of an oil-producing species comprising a nucleic 
acid fragment of Claim 2 operably linked to suitable regulatory sequences resulting in overexpression 
of said soybean seed stearoyl-ACP desaturase in the plastid of said plant cell. 

6. A chimeric gene capable of transforming a plant cell of an oil-producing species comprising a nucleic 
acid fragment of Claim 3 operably linked to suitable regulatory sequences resulting in the expression of 
said mature soybean seed stearoyl-ACP desaturase enzyme In the cytoplasm of said plant cell. 

7. A method of producing soybean seed oil containing higher-than-normal levels of stearic acid compris- 
ing: 

(a) transforming a soybean plant cell with a chimeric gene of Claim 4, 

(b) growing fertile soybean plants from sakj transformed soybean plant cells, 

(c) screening progeny seeds from said fertile soybean plants for the desired levels of stearic acid, 
and 

(d) crushing said progeny seed to obtain said soybean oil containing higher-than-normal levels of 
stearic acid. 

8. A method of producing oils from plant seed containing lower-than-normal levels of stearic acid 
comprising: 

(a) transforming a plant cell of an oil producing species with a chimeric gene of Claims 5 or 6, 

(b) growing sexually mature plants from said transformed plant cells of an oil producing species, 

(c) screening progeny seeds from said fertile plants for the desired levels of stearic acid, and 

(d) crushing said progeny seed to obtain said oil containing lower-than-normal levels of stearic acid. 

9. A method of Claim 8 wherein said plant cell of an oil producing species Is selected from the group 
consisting of soybean, rapeseed. sunflower, cotton, cocoa, peanut, safflower. and corn. 

10. A method of Claim 7 wherein said step of transforming is accomplished by a process selected from the 
group consisting of Agrobacterium infection, electroporation, and high-velocity ballistic bombardment. 
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11. A method of Claim 8 wherein said step of transforming is accomplished by a process selected from the 
group consisting of Agrobacterium infection, electroporation, and high-velocity ballistic bombardment. 

12. A method of producing mature soybean seed stearoyl-ACP desaturase enzyme in microorganisms 
comprising: 

(a) transforming a microorganism with a chimeric gene of Claim 6. 

(b) growing said transformed microorganism to produce quantities of said mature soybean seed 
stearoyl-ACP desaturase enzyme, and 

(c) isolating and purifying said mature soybean seed stearoyl-ACP desaturase enzyme. 

13. A method of breeding soybean plants producing altered stearic acid levels in seed oil due to altered 
levels of stearoyl-ACP desaturase in said soybean plants by RFLP mapping comprising: 

(a) making a cross between two soybean varieties differing in stearic acid levels due to altered levels 
of stearoyl-ACP desaturase; 

(b) making a Southern blot of genomic DNA isolated from several progeny plants resulting from the 
cross following digestion with a suitable restriction enzyme that reveals polymorphism linked to the 
altered levels of stearic acid using a radiolabelled nucleic acid fragment of Claim 1 as a hybridization 
probe; 

(c) hybridizing the Southern blot with the radiolabelled nucleic acid fragment of Claim 1; and 

(d) selecting said soybean plants that inherit the RFLP linked to the desired level of stearic acid. 

PatentansprUche 

1 Isoliertes Nukleinsaurefragment. umfassend eine Nukleotidsequenz. die fur Sojabohnensamen-Stearoyl- 
ACP-Desaturase kodiert. die den Nukleotlden 1 - 2243 in SEQ ID N0:1 entspricht, oder ein Sojaboh- 
nen-Nuklelnsaurefragment, das im wesentlichen dazu homolog ist. das fur eIne funktionelle Stearoyl- 
ACP-Desaturase kodiert. 

2. Isoliertes Nukleinsaurefragment nach Anspruch 1, worin die genannte Nukleotidsequenz fOr die Soja- 
bohnensamen-Stearoyl-ACP-Desaturase-Vorstufe, entsprechend den Nukleotlden 70 - 1245 in SEQ ID 
N0:1. kodiert, oder ein Sojabohnen-Nukleinsaurefragment, das im wesentlichen dazu homolog ist und 
fur eine funktionelle Stearoyl-ACP-Desaturase-Vorstufe kodiert. 

3. Nukleinsaurefragment nach Anspruch 2. bei dem die genannte Nukleotidsequenz fur das Stearoyl-ACP- 
Desaturase-Enzym von reifem Sojabohnensamen kodiert, das den Nukleotlden 166 - 1245 in SEQ ID 
N0:1 entspricht. 

4. Chimares Gen, das in der Lage ist, eine Sojabohnen-Pflanzenzelle zu transformieren, umfassend ein 
Nukleinsaurefragment nach Anspruch 1, das zweckorientiert mit geeigneten regulatorischen Sequenzen 
verknupft ist, die eine Antisinn-Hemmung der Sojabohnensamen-Stearoyl-ACP-Desaturase in dem 
Samen erzeugen. 

5. Chimares Gen, das in der Lage ist, eine Pflanzenzelle einer blproduzlerenden Spezies zu transformie- 
ren. umfassend ein Nukleinsaurefragment nach Anspruch 2. das mit geeigneten regulatorischen 
Sequenzen zweckorientiert verknupft ist. was zu einer Uberexpression der genannten Sojabohnensa- 
men-Stearoyl-ACP-Desaturase in dem Plastid der genannten Pflanzenzelle fuhrt. 

6. Chimares Gen, das in der Lage ist. eine Pflanzenzelle einer olproduzierenden Spezies zu transformie- 
ren. umfassend ein Nukleinsaurefragment nach Anspruch 3, das mit geeigneten regulatorischen 
Sequenzen zweckorientiert verknupft ist, was zu der Expression des genannten Stearoyl-ACP-Desatura- 
se-Enzyms von reifem Sojabohnensamen in dem Cytoplasma der genannten Pflanzenzelle fuhrt. 

7. Verfahren zur Herstellung von Sojabohnensamenbl, enthaltend hohere als normale Konzentrationen an 
Stearinsaure, umfassend: 

(a) Transformieren einer Sojabohnen-Pflanzenzelle mit einem chimaren Gen nach Anspruch 4, 

(b) Zuchten der fruchtbaren Sojabohnenpflanzen aus den genannten transformierten Sojabohnen- 
Pflanzenzellen. 
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,c, 0.»rprtllung de, zeu9un9.»ig»n S.m.. aus dsn genanmen .ruchtba-en Solabohnenpllanzen 
tZ^^.:t~:^SZrL,r.. ge„a«. S0,a..ne„. . 

Uallen das hdtiere als nomale slsarlnsSure-Konzentrationen enthalt. 
vedahran zu, H«s»,l.ng von 6len aus P„=nzenaa„,.n, dia nieddgara .,a no,n,ale Staannaaura- 
^aTw^rrSe: «raS:l.. 5,p™dn.er.^an Spazlas n,„ a,nan, ch,..™n 3an 
TzSa'T^'Sx*: r*™ Pl^-n ana dan genann« .ans.o,n,iedan «e«an aina, 
rCrC'S~a,.higan S.n»„ a. dan g,nann«n „ucn*»en P.,anzan ^ d,e 

|75r^~;rr ~gin Sa.a„s. ^ das oa„.n« 0. . a,nana„. das 
niedrigere als normale Stearinsaure-Konzentrationen enthalt. 

rrp:rs;5r»iJir.-a^^^^^^^ 

ErdnuB, Farberdistel und Mais. 

- =nr-rcjr.sra/rpr:i^r.i'r^^^ 

Elektroporatlon und einer HochgeschwIndlgkeltsstoBbombardierung. 

rt:;:::jrrda:ir'i=:nrs^^^^ 

tion und einer HochgeschwindigkeitsstoBbombardierung. 

Verfahren zur Harstellung des Stearoyl-ACP-Desaturase-Enzyms von reifem Sojabohnensamen in 



11 



12 



samen. 



- ::r^^o-r.?rg^si=an»r 

SronrST."^'Sr».rarnt=v ~ ►.«ns.a,a.,ag.an, 
rH^^n'S^g'rr^-SrZrrl «..,ns.n.a.ag™n. nac. 

Ta-s'';!^"'''" Solabonnanpltenzan. d« das RFUP. das n,« da, ga»«nach,.. 

Stearinsaure-Konzentration verknupft ist, vererben, 

Revendications 

u„ tragman, d'aoida nud^^ua iso,« c<,n,pranan, una ^««^^^^^^ T^^,"^^^, 

desaturase fonctionneile. 
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2. Un fragment d'acide nucleique isole de la revendication 1. dans lequel ladite sequence nucleotidique 
code pour le precurseur de stearoyl-ACP-desaturase de graine de soja correspondant aux nucleotides 
70 h 1245 de SEQ ID N° 1, ou tout fragment d'acide nucleique de soja sensiblement homologue a 
celui-ci codant pour un precurseur de stearoyl-ACP-desaturase fonctionnel. 

5 

3. Un fragment d'acide nucleique de la revendication 2, dans lequel ladite sequence nucleotidique code 
pour la stearoyl-ACP-desaturase de graine de soja mOre, correspondant aux nucleotides 166 a 1245 de 
SEQ ID N° 1. 

10 4. Un gene chim^rlque capable de transformer une cellule de soja, comprenant un fragment d'acide 
nucleique de la revendication 1, lie fonctionnellement a des sequences regulatrices appropriees 
produisant une inhibition anti-sens de la stearoyl-ACP-desaturase dans la graine. 

5. Un gene chimerique capable de transformer une cellule vegetale d'une espece productrice d'huile, 
75 comprenant un fragment d'acide nucleique de la revendication 2 lie fonctionnellement a des sequences 

regulatrices appropriees donnant lieu a una surexpression de ladite stearoyl-ACP-desaturase de graine 
de soja dans le plastlde de ladite cellule vegetale. 

6. Un gene chimerique capable de transformer une cellule vegetale d'une espece productrice d'huile, 
20 comprenant un fragment d'acide nucleique de la revendication 3 lie fonctionnellement k des sequences 

regulatrices appropriees donnant lieu a I'expression de ladite stearoyl-ACP-desaturase de graine de 
soja mOre dans le cytoplasme de ladite cellule vegetale. 

7. Un precede de production d'huile de graine de soja contenant des taux superieurs a la normale d'acide 
25 stearique, consistant k : 

(a) transformer une cellule de soja avec un gene chimerique de la revendication 4, 

(b) faire croTtre des plants de soja fertiles k partir de cellules de soja transformees, 

(c) selectionner des graines de descendance provenant desdits plants de soja fertiles pour les taux 
souhaites d'acide stearique, et 

30 (d) broyer lesdites graines de descendance pour obtenir ladite huile de soja contenant des taux 

d'acide stearique superieurs a la normale. 

8. Un precede de production d'hulles ^ partir de graines vegetales contenant des taux d'acide stearique 
inferieurs a la normale. consistant ^ : 

35 (a) transformer une cellule vegetale d'une espece productrice d'huile avec un gfene chimerique de la 

revendication 5 ou 6, 

(b) faire croltre des plants sexuellement matures a partir desdites cellules vegetales transformees 
d'une espece productrice d'huile, 

(c) selectionner des graines de descendance provenant desdits plants fertiles pour les taux desires 
40 d'acide stearique, et 

(d) broyer lesdites graines de descendance pour obtenir I'huile contenant des taux d'acide stearique 
inferieurs h la normale. 

9. Un precede de la revendication 8, dans lequel ladite cellule vegetale d'une espece productrice d'huile 
45 est choisie dans le groupe forme par le soja, le colza, le tournesol, le cotonnier, le cacaoyer, I'arachide, 

le carthame et le maVs. 

10. Un precede de la revendication 7, dans lequel ladite etape de transformation est executee par- un 
precede choisi dans le groupe forme par une infection par Agrobacterium, une electroporation et un 

50 bombardement balistique a grande vitesse. 

11. Un precede de la revendication 8, dans lequel ladite etape de transformation est executee par un 
precede choisi dans le groupe forme par une Infection par Agrobacterium, une electroporation et un 
bombardement balistique a grande vitesse. 

55 

12. Un precede de production de stearoyl-ACP-desaturase de graine de soja mOre dans des microorgani- 
smes, consistant a : 

(a) transformer un microorganisme avec un gene chimerique de la revendication 6, 
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Description 

FIELD OF INVENTION 

5 The present invention relates the use of perfect, compound simple sequence repeats (SSR) as self-anchoring 

primers for the identification and analysis of DNA sequence polymorphisms. More specifically it has been observed 
that any one type of simple sequence repeat (SSR) in both plant and animal genomes often exists directly adjacent to 
an SSR of a different type, usually with perfect periodicity of one of the component nucleotides shared by both SSRs. 
This observation has allowed the design of self -anchoring primers in new variations of polymerase chain reaction- 

10 based multiplexed genome assays, including inter-repeat amplification and amplified fragment length polymorphism 
assays. These method variations collectively have been termed selective amplification of microsatellite polymorphic 
loci (SAMPL). 

BACKGROUND 

IS 

The ability to map eukaryotic genomes has become an essential tool for the diagnosis of genetic diseases, and 
for plant breeding and forensic medicine. An absolute requirement for elucidation of any genetic linkage map is the 
ability to identify DNA sequence variation. The realization that genetic (DNA) polymorphisms between phenotypically 
identical Individuals are present and can be used as markers for genetic mapping has produced major advances in 

20 the art of developing eukaryotic linkage maps. 

Techniques for identifying genetic polymorphisms are relatively few and to date have been time consuming and 
labor intensive. One of the most common techniques is referred to as restriction fragment length polymorphism or 
RFLP (Botstein et al. Am, J. Hum. Genet 342, 31 4, (1 980)). Using RFLP technology, genetic markers based on single 
or multiple point mutations in the genome may be detected by differentiating DNA banding patterns from restriction 

25 enzyme analysis. As restriction enzymes cut DNA at specific target site sequences, a point mutation within this site 
may result in the loss or gain of a recognition site, giving rise in that genomic region to restriction fragments of different 
length. Mutations caused by the insertion, deletion or inversion of DNA stretches will also lead to a length variation of 
DNA restriction fragments. Genomic restriction fragments of different lengths between genotypes can be detected with 
region-specific probes on Southern blots (Southern, E. M., J. Moi BioL 98, 503. (1975). The genomic DNA is typically 

50 digested with nearly any restriction enzyme of choice. The resulting fragments are electrophoretically size-separated, 
transferred to a membrane, and then hybridized against a suitably labelled probe for detection of fragments corre- 
sponding to a specific region of the genome. RFLP genetic markers are particulariy useful in detecting genetic variation 
in phenotypically silent mutations and serve as highly accurate diagnostic tools. RFLP analysis is a useful tool in the 
generation of codomtnant genetic markers but suffers from the need to separate restriction fragments electrophoreti- 

3S cally and often requires a great deal of optimization to achieve useful background to signal ratios where significant 
polymorphic markers can be detected. In addition, the RFLP method relies on DNA polymorphisms existing within 
actual restriction sites. Any other point mutations in the genome usually go undetected. This is a particularly difficult 
problem when assaying genomes with inherently low levels of DNA polymorphism. Thus. RFLP differences often are 
difficult to Identify. 

40 Another method of identifying polymorphic genetic markers employs DNA amplification using short primers of 

arbitrary sequence. These primers have been termed 'random amplified polymorphic DNA', or 'RAPD" primers, Wil- 
liams et a!., NucL Acids. Res., 1 8, 6531 (1 990) and U.S. 5, 1 26,239; (also EPO 543 484 A2. WO 92/07095, WO 92/07948, 
WO 92/1 4844, and WO 92/03567). The RAPD method amplifies either double or single stranded nontargeted. arbitrary 
DNA sequences using standard amplification buffers, dATP, dCTP, dGTP and TTP nucleotides, and a thermostable 

45 DNA polymerase such as Taq polymerase. The nucleotide sequence of the primers is typically about 9 to 13 bases in 
length, between 50 and 80% G+C in composition and contains no palindromic sequences. Differences as small as 
single nucleotides between genomes can affect the RAPD primer's binding/target site, and a PGR product may be 
generated from one genome but not from another. RAPD detection of genetic polymorphisms represents an advance 
over RFLP in that it is less time consuming, more informative, and readily adaptable to automation. The use of the 

so RAPD assay is limited, however, in that only dominant polymorphisms can be detected; this method does not offer the 
ability to examine simultaneously all the alleles at a locus in a population. Nevertheless, because of its sensitivity for 
the detection of polymorphisms, RAPD analysis and variations based on RAPD/PCR methods have become the meth- 
ods of choice for analyzing genetic variation within species or closely related genera, both in the animal and plant 
kingdoms. 

55 A third method more recently introduced for identifying and mapping genetic polymorphisms is termed amplified 

fragment length polymorphism or AFLP (M. Zabeau, EP 534,858). AFLP is similar in concept to RFLP in that restriction 
enzymes are used to specifically digest the genomic DNA to be analyzed. The primary difference between these two 
methods is that the amplified restriction fragments produced in AFLP are modified by the addition of specific, known 
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adaptor sequences which serve as the target sites for PGR amplification with adaptor-directed primers. Briefly, restric- 
tion fragments are generated from genomic DNA by complete digestion with a single or double restriction enzyme 
combination, the latter using a 'frequent* cutter combined with a "rare" cutter. Optimal results are obtained when one 
of these enzymes has a tetranucleotide recognition site, and the other enzyme a hexanucleotide site. Such a double 
s enzyme digestion generates a mixture of single- and double-digested genomic DNA fragments. Next, double-stranded 
adaptors composed of synthetic oligonucleotides of moderate length (1 0-30 bases) are specifically ligated to the ends 
of the restriction fragments. The individual adaptors corresponding to the different restriction sites all carry distinct DNA 
sequences. 

One of the adaptors, usually the one corresponding to the hexanucleotide-site restriction enzyme, carries a biotin 
10 moiety. The application of biotin-streptavidin capture methodology leads to the selective removal of all nonbiotinylated 
restriction fragments (those bordered at both ends by the tetranucleotide restriction site), and thus effectively enriches 
the population for fragments carrying the biotinylated adaptor at one or both ends. As a result, the DNA fragment 
mixture is also enriched for asymmetric fragments, those carrying a different restriction site at each end. The selected 
fragments sen/e as templates for PGR amplification using oligonucleotide primers that correspond to the adaptor/ 
is restriction site sequences. These adaptor-directed primers also can include at their 3' ends from 1 to 10 arbitrary 
nucleotides, which will anneal to and prime from the genomic sequence directly adjacent to the restriction site on the 
DNA fragment. 

A PGR reaction using this type of pooled fragment template and adaptor-directed primers results in the co-ampli- 
fication of multiple genomic fragments. Any DNA sequence differences between genomes in the region of the restriction 

20 sites or the 1 -1 0 nucleotides directly adjacent to the restriction sites leads to differences, or polymorphisms (dominant 
and codominant), in the PGR products generated. Multiple fragments are simultaneously co-amplified, and some pro- 
portion of these will be polymorphic between genomes, 

A fourth method of assaying polymorphisms has involved utilizing the high degree of length variation resulting from 
certain repeating nucleotide sequences found in most genomes. Most if not all eukaryotic genomes are populated with 

25 repeating base sequences variously termed simple sequence repeats (SSR), simple sequence length polymorphisms 
(SSLP), dinucleotide, trinucleotide, tetranucleotide, or pentanucleotide repeats, and microsatellites. Simple sequence 
repeats have been demonstrated to be useful as genetic markers (DE 38 34 636 A, Jackie et al.; Weber, et al. Am J 
Hum Genet 44, 388, (1989); Litt et al.. Am J Hum Genet 4, 397, (1989)). Weber et aL. (Genomics 11, 695, (1991)) 
have successfully used SSRs for comparative analysis and mapping of mammalian genomes, and several groups 

30 (Akkaya. etal., G9netics^32, 1131 (1992), Morgante & Olivieri, Plant J2, 175(1993), Wu&Tanksley, Mol. Gen. Genet. 
241 , 225, (1 993)) have demonstrated similar results with plant genomes. SSR polymorphisms can be detected by PGR 
using minute amounts of genomic DNA and, unlike RAPDs, they provide codominant markers and can detect a high 
degree of genetic polymorphism (Weber, J. L. (1 990) In Tilghman, S., Daves, K., (eds) 'Genome Analysis vol. 1 : Genetic 
and physical mapping', Gold Spring Harbor Laboratory Press, pp 159-181.). 

35 Although SSR-directed PGR primers are highly effective for detecting polymorphism, their use suffers from a variety 

of practical drawbacks. Typically, markers generated by these methods are obtained by first constructing a genomic 
library, screening the library with probes representing the core elements of a particular repeat sequence, purifying and 
sequencing the positive clones, and synthesizing the primers specific for the flanking sequences for each cloned SSR 
locus. Genomic DNA is then amplified to screen for polymorphisms, and mapping of the genome is then carried out. 

40 The entire process is time consuming, expensive and technically demanding, and as a result has been somewhat 
limited in its application. 

At least one method has been developed as an attempt to circumvent these limitations and allow the use of pol- 
ymorphic SSR markers more directly, with no a priori knowledge of particular SSR locus sequences. Zietkiewicz, et 
al., (Genomics, 20, 176, (1994)), for example, demonstrate that a single-primer PGR amplification can be used to 

45 detect length polymorphisms between adjacent (GA)^ repeats in animal and plant genomes. The PGR primers used 
for this assay each contain a particular SSR sequence that is flanked immediately 5' or 3' by 2 to 4 nucleotides of 
known or arbitrary sequence; these anchor sequences anneal to the non-SSR genomic sequences that flank the SSR 
sequence in the genome and sen^e to "anchor" the primer to a single position at each matching SSR locus. Radiolabeled 
SSR-to-SSR amplification products, generated when adjacent SSR sequences are oppositely oriented and spaced 

50 closely enough in the genome, are analysed by gel electrophoresis followed by autoradiography This approach elim- 
inates the need for cloning and sequencing SSRs from the genome, and reveals an enriched polymorphic banding 
pattern relative to single-locus SSR. Zietkiewicz et al, attribute the enriched pattern to use of the arbitrary sequence 
anchor, which allows the SSR primer to anneal and prime from many SSR target loci simultaneously 

In a concept simitar to Zetkiewicz, Wu et al., (Nucleic Acids Res. 22, 3257, (1 994)) teach a method for the detection 

55 of polymorphisms where genomic DNA is amplified by asymmetric themnally cycled PGR using radiolabeled 5' anchored 
primers consisting of microsatellite repeats in the presence of RAPD primers of arbitrary sequence. The method of Wli 
et al. is useful for the generation of genetic markers that incorporate many features of microsatellite repeats. Wu et al. 
does not disclose the use of compound microsatellite repeate primers for amplification. 
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Simple sequence repeats may be classified in several ways. In categorizing and characterizing human (CA)^ or 
(GT)n repeats, Weber, J. L. (Genome Analysis Vol. 1, 159, 1990, Cold Spring harbor Laboratory Press, NY) defines at 
least three types of (CA)n SSR: simple perfect SSR, simple imperfect SSR, and compound SSR. Each perfect SSR 
is considered to be a simple (CA)„ tandem sequence, with no interruptions within the repeat. Imperfect SSR are defined 

s as those repeating sequences with one or more interruptions of up to 3 nonrepeat bases within the run of the repeat. 
Compound SSR are defined as those sequences with a CA or GT repeat stretch adjacent to or within 3 nucleotides of 
a block of short tandem repeats of a different sequence. Weber notes that perfect sequence repeats in humans comprise 
about 65% of the total (dC-dA)n(dG-dT)n sequences cloned from the genome, imperfect repeats about 25%, and com- 
pound repeats about 10%. Weber theorizes that because perfect repeats contain the longest uninterrupted repeat 

10 blocks, they appear to provide the most useful information. Weber also teaches that repeats composed of 1 2 or more 
uninterrupted units are consistently more polymorphic than are shorter repeat stretches. Because imperfect repeats 
generally contain shorter repeat stretches, they appear to be less useful as indicators of polymorphism. Compound 
repeats in general have not been well characterized, and their potential informativeness has not been clearly estab- 
lished. 

IS Others have used the polymorphisms detectable within perfect, imperfect and compound SSR loci to build genetic 

linkage maps. Buchanan et al. (Mammalian Genome, 4, 258, (1 993)) teach that there is little difference in the utility of 
the different SSR types in the ovine genome with respect to their absolute polymorphism levels; the perfect, imperfect 
and compound repeats although likely present in the genome at differing frequencies (perfect and imperfect simple 
SSR's are more frequent than compound) were found to have similar average Polymorphism Information Content (PIC) 

20 values as defined by Botstein et al. (Am. J. Hum, Genet, 32, 314, (1980)). In a study of (GT)^ SSR in the Atlantic 
salmon genome, Slettan et al. (Animal Genetics, 24, 1 95, (1 993)) found both perfect and imperfect simple SSR but no 
compound repeats. In an examination of the equine genome, Ellegren et al. (Amimal Genetics, 23, 133, (1993)) iden- 
tified the highest levels of polymorphism involving (TG)^ and (TC)^ repeats among horse genotypes using primers 
designed to amplify perfect or imperfect simple repeats; although two of eight cloned (GT)„ repeats were identified to 

25 be compound in structure (one perfect, one imperfect), neither was characterized further. Condit & Hubbell (Genome 
34. 66 (1991)), in characterizing large-insert clones carrying (AC)^ and (AG)^ repeats from tropical trees and maize, 
found that 10-20% of inserts carrying one type of repeat also carried the other, and that many (AC)^ sites also had 
other two-base repeats adjacent or nearby Finally, Browne et al. (NucL Acids Res., 20, 1 41 , (1 991 )), in an attempt to 
characterize (C A)^ SSR sequences in the human genome by DNA sequencing with degenerate (CA)^ primers, disclose 

30 that 88% of their (CA)^ repeats carried AT base pairs at one or both ends of the CA repeat. 

To date, the record in the literature would indicate that although it varies with each type of genome, the incidence 
of compound SSRs in a genome is lower than that of either perfect of imperfect simple SSR sequences. Nevertheless, 
the information content (PIC value) of compound SSR sequences has been shown to be generally high. In addition, 
the literature would indicate that the detection of genetic polymorphisms by way of specifically isolating compound 

35 SSR loci generally would have marginal success; the use of probes or primers designed to recognize and thus spe- 
cifically isolate individual compound SSR loci would be less efficient for generating large numbers of new SSR markers 
as compared to the isolation of the more numerous simple SSR sequences. Applicants have, however, unexpectedly 
discovered that compound SSR's, particularly those containing (AT)^ repeats are highly polymorphic in eukaryotic 
genomes, and that oligonucleotides designed to anneal specifically to a specific type of SSR, termed herein as a perfect 

40 in-phase compound SSR, are particularly useful in the generation and detection of polymorphisms between eukaryotic 
genomes. A "perfect" compound SSR is one in which two different repeating sequences, each of which could be com- 
posed of di-, tri-, tetra-, or penta-nucleotide units, are located very near each other, with no more than 3 intervening 
bases between the two repeat blocks. One category of perfect compound repeat is one in which the two constituent 
repeats are immediately adjacent to one another, with no inten^ening bases. Further, a perfect compound SSR can be 

4S classified to be "in-phase" if both of the component simple repeats share a common nucleotide whose spacing is 
conserved across the repeat junction and over the length of the two repeat blocks. For example, the in-phase perfect 
compound SSR, (AT)n(AG)n, maintains the adenosine base "in-phase" across both components of this perfect com- 
pound structure. 

Applicants have discovered that in-phase perfect compound SSR sequences such as (dC-dA)n(dT-dA)n are abun- 
50 dant in both animal and plant genomes. Although the frequency of occurrence of each type of perfect compound SSR 
sequence varies within, as well as between, species, those sequences that are in-phase are of sufficiently high fre- 
quency in all eukaryotic genomes examined, and appear to be both well dispersed and highly polymorphic. Based 
upon their observation that the junction spanned by such directly adjacent, in-phase perfect repeats is absolutely pre- 
dictable, Applicants have developed methodology which utilizes synthetic oligonucleotides containing in-phase com- 
55 pound sequences as self-anchoring primers in new variations of polymerase chain react ion -based multiplexed genome 
assays, including inter-repeat amplification and amplified fragment length polymorphism assays. Applicants have found 
that the 5* end of the compound SSR primer serves as an extremely efficient anchor base for primer extension that 
occurs from the 3'-end repeat. This primer extension initiates from inside the compound SSR target sequence, such 
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that any length variation between different alleles at a target SSR locus is detectable as a corresponding length bariation 
in the resulting amplification products. Because such use of perfect connpound SSRs as amplification primers generates 
multiple products wherein a high proportion are polymorphic (as high as 80%), Applicants believe that the method of 
their invention greatly facilitates the simultaneous identification of multiple genomic polymorphisms, both codominant 
5 and dominant. Thus, the present method offers great advantage in identifying polymorphic markers linked to genetic 
traits of interest, and also offers an efficient and convenient generic technique for genome fingerprinting and whole- 
genome comparisons. 

SUMMARY OF THE INVENTION 

70 

This invention provides an improved method of detecting polymorphisms between two individual nucleic acid sam- 
ples comprising amplifying segments of nucleic acid from each sample using primer-directed amplification and com- 
paring the amplified segments to detect differences, the improvement comprising wherein at least one of the primers 
used in said amplification consists of a perfect compound simple sequence repeat. In a preferred embodiment, the 
IS compound primer is in-phase. 

In a most preferred embodiment the present invention provides a method for the detection of genetic polymor- 
phisms using a combination of in-phase perfect compound SSR primers and synthetic adaptor<iirected primers for 
PGR amplification from restriction enzyme digested genomic DNA templates to which fixed-sequence adaptors have 
been tigated. 

20 The present methods are particularly useful In the areas of clinical genetic diagnostics, forensic medicine (where 

it Is Important to detect small polymorphic changes in nucleic acid composition), as well as in the areas of animal and 
plant breeding and gene mapping. As specific applications, these methods have great utility for genome fingerprinting, 
polymorphic marker identification (i.e.. "marking" a phenotypic trait), and germplasm comparisons. 

25 BRIEF DESCRIPTION OF THE FIGURES 

Figure 1 illustrates a schematic representation of SSR-to-adaptor amplification. Panel a depicts the restriction 
enzyme double digestion, adaptor ligation, and biotin-streptavidin selection process for generating the DNA template 
mixtures, as well as the selective nature of a specific SSR primer for exponential SSR-to-adaptor amplification from a 

30 ' complex template DNA mixture. Panel b depicts the selective nature of the adaptor-directed primer, in this case con- 
taining at its 3' end one nondegenerate selective nucleotide, for discriminating template fragments that othenwise would 
be amplified by a common SSR primer. Diagonally hatched boxes indicate the biotinylated adaptor corresponding to 
a restriction enzyme with a 6-bp recognition site, and dark boxes depict the nonbiotinylated adaptor corresponding to 
an enzyme with a 4-bp recognition site. The biotin moiety Is indicated by a solid circle. The vertically striped box indicates 

35 either a simple SSR or a compound SSR of a particular type that matches the SSR primer used for the PGR amplifi- 
cation. Arrows depict PGR primers, with the arrowhead showing the direction of primer extension; solid/dark arrows 
indicate adaptor-directed PGR primers, and vertically striped-hatched arrows indicate primers corresponding to the 
SSR sequence depicted on the template fragments. Only the SSR<lirected primer is tagged with either a fluorescent 
or radiolabel, as indicated by *. Panel G depicts a perfect, in-phase compound SSR as a double-stranded locus In the 

40 genome ((AT)x(AG)y, here where x=11 .5 and y=10), which can serve as a target site for two classes of primer, each 
representing one strand of the double-stranded target locus. Individual primers within each class can differ by the 
relative length of each constituent repeat. The two classes of primer Initiate primer extension in opposite directions 
(small arrows). In every case, the primer anneals to a fixed site at the target, and primer extension initiates Inside the 
SSR region. In each case, any length variation in the 3'-most repeat between genomes could be detected as a codo- 

45 minant polymorphism using SSR-to-adaptor amplification. 

Figure 2 illustrates an autoradiograph of a denaturing polyacrylamide gel that compares the co-amplified products 
from SSR-to-adaptor reactions on Taq I + Pst I digested, adaptor modified, blotin-selected template DNAs prepared 
from four different Glycine maxor Glycine soja genotypes (N, maxN85-2176; No, maxNOIR-1; W, maxwolverine; S, 
soja PI 81762), 33p.|abeled simple SSR primers containing 3 bp degenerate anchors at their 5* ends [HBH(AG)q 5, 

50 DBD(AG)7.5, HVH(TG) 7.5] are paired with unlabeled Taq I adaptor-directed primers containing either zero (TaqAd.F) 
or one (Taq.prS, Taq.prS) selective nucleotide at their 3' ends. Gold start amplifications employed either a constant 
temperature (58*G) or touchdown (59°G final temperature) thermocycle profile (left and right panels, respectively). An 
arrow indicates a likely codominant polymorphism. 

Figures 3a and 3b illustrate autoradiographs of denaturing polyacrylamide gels that compare the co-amplified 

55 products from SSR-to-adaptor reactions on Taq I + Pst I digested, adaptor-modified, biotin-selected templated DNAs 
prepared from 15 different soybean genotypes. 33p.|abeled primers corresponding to perfect compound SSRs. (TG) 
4.5(TG)4.5, (GT)7.5(AT)3.5 and (GA)7.5(TA)2.5 [panel a] or (TG)4.5(AG)4.5 and (TG)4.5(AG)4.5 [panel b], each are paired 
with unlabeled Taq l-adaptor primers containing either zero (TaqAd.F) or one (Taq.prS) 3'-selective nucleotide, under 
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cold start amplification conditions utilizing a touchdown (56°C final) thernnocycle profile. In each set, lane 1, Glycine 
max wolverine; lane 2, G. maxNOIR-1; lane 3, G. maxN85-21761; lane 4, G. max Harrow; lane 5, G. max CNS; lane 
6, G. max Manchu; lane 7, G. max Mandarin; lane 8, G. max Mukden; lane 9, G. max Richland; lane 10, G. max 
Roanoke; lane 11. G. max Tokyo; lane 12, G. max PI 54.610; lane 13, G. max Bonus; lane 14, G. soja PI 81762; lane 

5 15, G. soja PI 440.91 3.. The size distribution of products is similar to that in Figure 2. Lane 9 of the (TG)4 5(AG)4 g+Taq. 
Pr8 set is a misloading of an incorrect (non-soybean) sample. 

Figure 4 illustrates an autoradiograph of a denaturing polyacrylamide gel that compares the co-amplified products 
from SSR-to-adaptor amplifications performed on five different soybean genotypes (lane 1 , G. max wolverine; 2, G. 
max NOIR-1; 3, G. max N85-21 76; 4, G. max Bonus; 5, G. soja PI 81762) prepared by digestion with Taq I combined 

10 with either Hind III or Pst I (H+T or P+T respectively). Cold start amplifications using a 56*0 touchdown thermocycle 
profile utilized the ^^p-iabeied SSR primer, (CA)7.5(TA)2.5, in combination with the indicated unlabeled Taq I adaptor- 
directed primer, Taq Ad.F or Taq.PrS (zero or one 3'-selective nucleotide, respectively). X indicates a misloaded (in- 
correct) lane. 

Figure 5a illustrates an autoradiograph of a denaturing polyacrylamide gel demonstrating the segregation of pol- 

15 ymorphic co-amplification products of 66 F2 progeny from a cross between G. soya PI 81 762 and G. max Bonus. A ^^P- 
labeled primer corresponding to the perfect compound SSR, (CA)7.5(TA)2.5, was paired with the 3'-selective nucleotide 
Taq.pr6 adaptor-directed primers. Blank lanes are the result of 'missing dataV The scored polymorphic bands that 
segregate in this population are indicated. B, bonus parent; S, soja PI81762 parent. Figure 5b illustrates the map 
positions of 6 of the polymorphic segregating amplification products contained in panel a, as determined by MAPMAK- 

20 ER analysis of the products' respective segregation scores (contained in Table VI). 

Figure 6a illustrates an autoradiograph of a denaturing polyacrylamide gel comparing amplifications using the 
perfect compound SSR primer (CA)7.5(TA)2.5, paired with either the Taq.AdF or the more selective Taq.prS adaptor- 
directed primer, on template DNAs derived from either 5 (wolverine, NOIR-1, N85-2176, Bonus, PI 81762) or 15 soy- 
bean cultivars same ordering of genotypes as in Figures 3a and 4. respectively, and 6 mammalian individuals (one rat, 

25 four human, one mouse BALB/C). All biotin-selected templates were prepared using Taq I combined with either Hind 
III or Pst I. Figure 6b illustrates a similar comparison of the co-amplification products using (CT)7.5(AT)2 and (GA)7,5 
(TA)2 perfect compound SSR primers, from Pst I + Taq I prepared template DNAs. The size distribution of products is 
similar to that in panel a. Figure 6c illustrates the SSR-to-adaptor amplification products generated from Taq I + Hind 
III prepared templates of Zea mays (corn) and salmon DNA templates, in comparison to those from soybean Bonus 

30 and PI81762, using (CA)7.5(TA)2.5 paired with Taq.pr6 primer. Lane 1, Z. mays B73; lanes 2 and 3, individual salmon 
sources; lane 4, Z mays CM27; lane 5, Z mays T232; lane 6, Z mays DE81 1 ASR; lane 7, Z. mays LH1 32; lanes 8-10, 
two F2 individuals from a G. max Bonus x G. soya PI 81762 cross. The distribution of product sizes is similar to that 
shown in Figure 6a. 

Figure 7 illustrates an autoradiograph of a denaturing polyacrylamide get comparing the co-amplification products 
3S from Taq I + Pst I prepared soybean templates (S, Soya PI81762; W. Wolverine; B, bonus) using (CA)7.5(TA)2.5 paired 
individually with Taq I adaptor-directed primers carrying either zero (Taq.AdF) or one specific 3'-selective nucleotide 
(Taq.prS, .pr6, .pr7, .pr8). 

Figure 8 illustrates an autoradiograph of a denaturing polyacrylamide gel comparing SSR-to-adaptor amplification 
from wolverine and PI 81762 soybean cultivars using SSR primers respresenting the complementary strands of a 

40 perfect compound SSR double stranded sequence [(AT)x(GT)y:(CA)x(TA)y] paired separately with three different Taq 
I adaptor-directed primers, Taq.AdF, Taq.pr6 and Taq.pr8. Each strand of the double stranded compound SSR se- 
quence is represented by three primers that differ by the relative lengths of each of the two constituent repeats within 
the primer. The (AT)q.5(GT)2.5 and (AT)6.5(GT)4.5 primers were completely inefficient (no amplification products gen- 
erated (data not shown) and (AT)3.5(GT)6.5 was moderately successful (shown), in comparison to the three, more 

45 efficient (CA)x(TA)y primer types. 

Figure 9 illustrates autoradiographs of denaturing polyacrylamide gels that compare the co-amplification products 
from soybean cultivars, wolverine and PI 81762, using cold start (left) and hot start (right) methods for the initiation of 
thermocycling. For both methods, the perfect compound SSR primer, (CA)7,5(TA)2.5, is paired with each of the three 
Taq I adaptor directed primers indicated. 

so Figure 10a illustrates an autoradiograph of a denaturing polyacrylamide gel comparing the co-amplification prod- 

ucts from soybean (B, Bonus; S, soja; PI81762) and corn (b, B73; c, CM37) cultivars amplified using DBD(AC)6 5 in 
combination with no 2nd adaptor primer or with Taq.prS. These templates are in the form of either intact, undigested 
DNA or Taq I + Pst I digested DNA (P+T), as indicated. 

Figure 10b illustrates autoradiographs of denaturing polyacrylamide gels comparing the amplification products 

ss obtained using ^^P-labeled 5'-anchored simple SSR primers [DBD(AC)7.5 and HBH(AG)8.5] or perfect compound SSR 
primers ((AT)3.5(AG)7.5 and (AT)3.5(GT)6.5] in single-primer amplifications (no adaptor primer used) from both undigest- 
ed and Taq I + Pst I digested, biotin-selected template DNAs from soybean wolverine (W) and PI81762 (S) cultivars. 
Cold start amplifications used either constant temperature (58*C) or 56°C touchdown annealing profiles. 
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Figure 11 is a schematic representation of the cloning and sequencing of a chosen SSR-to-adaptor amplification 
product, and its conversion into a defined, single-locus marker. A genomic restriction fragment carrying the targeted 
SSR repeat is bordered at both ends by restriction site-specific adaptors, Ad^ and Adg. This fragment sen/es as the 
template for PGR amplification using an SSR-directed primer and a primer corresponding to one of the adaptors. This 

5 amplification product is purified and sequenced, and a locus-specific flanking primer (lsfp-1) Is designed. This 1sfp-1 
primer then Is paired with a primer corresponding to the other adaptor, for PGR amplification using the adaptor-modified 
restriction fragment mixture as template. The specific product obtained is then isolated and sequenced, and a second 
primer (lsfp-2) corresponding to the unique flanking sequence on the other side of the SSR is designed. The 1 sf p-1 + 
1 sf p-2 primer pair uniquely defines this SSR locus, and can be used to amplify directly from genomic DNA to visualize 

10 SSR length polymorphism at this locus. Applicants have provided 89 sequence listings in confomitty with 37 G.F.R. 
1 .821 -1 .825 and Appendices A and B ("Requirements for Application Disclosures Containing Nucleotides and/or Amino 
Acid Sequences"). 

DETAILED DESCRIPTION OF THE INVENTION 

IS 

As used herein the following terms may be used for interpretation of the claims and specification. 

The tem^ "Simple sequence repeat (SSR)" or "microsatellite repeat (MS) " or "short tandem repeat" or "dinucleotide 
repeat" or "trinucleotide repeat" or "tetranucleotide repeat" or "microsatellite" or "simple sequence length polymor- 
phisms" (SSLP) all refer to stretches of DNA consisting of tandemly repeating di-, tri-, tetra-, or penta-nucleotide units. 
20 An SSR region can be as short as two repeating units, but more frequently is in excess of 8-1 0 repeating units. Simple 
sequence repeats are common in virtually all eukaryotic genomes studied and have been identified as useful tools for 
the study of genetic polymorphisms. 

Classification of SSR loci or SSR sequences as used herein is based upon (but not identical to) the definitions 
suggested by Weber {Genomics 7, 524 (1 990)) for the categorization of human (CA)^ dinucleotide repeats. 
25 The term "simple SSR" will refer to a region comprised of at least three or more of the same tandemly repeated 

di-, tri- or tetranucleotide sequence, which is not adjacent in the genome to any other different simple SSR. "Not ad- 
jacent" means not closer than four nucleotides away on either side. 

The term 'compound SSR" refers to a region consisting of two or more different simple SSR sequences which are 
adjacent. "Adjacent" means that differing simple SSR's are separated from one another by three or fewer consecutive 
30 nonrepeat nucleotides. 

The term "perfect SSR" refers to a simple SSR wherein every simple repeating unit within the SSR is intact and 
uninterrupted by nonrepeat nucleotides. 

The tema "imperfect SSR" refers to a simple SSR wherein one or more of the constituent repeat units is interrupted 
at least once within the SSR by three or four consecutive nonrepeat nucleotides. 
35 The term "perfect compound SSR" refers to a compound SSR wherein the two constituent repeating SSR regions 

are intact and uninterrupted by non repeat bases, (i.e., they are perfect SSR's), and the two perfect SSR regions are 
located very near each other, with no more than 3 intervening bases between the two repeat blocks, for example directly 
adjacent to one another, having no inten/ening nucleotides. 

The term "in-phase" refers to a potential feature of a perfect compound SSR wherein both constituent SSR regions 
40 share a common nucleotide that retains constant spacing spanning the junction of the two SSR regions. 

The term "out of phase" refers to a potential feature of a perfect compound SSR wherein a nucleotide is common 
to the two or more constituent repeating regions, but it does not retain constant spacing or periodicity across the junction 
of the compound structure. 

The term "polymorphism" refers to a difference in DNA sequence between or among different genomes or individ- 
45 uals. Such differences can be detected when they occur within known or tagged genomic regions. A "dominant poly- 
morphism" is a DNA difference that is detectable only as the presence or absence of a specific DNA sequence at a 
single locus. Methods to detect dominant polymorphisms are able to detect only one allele of the locus at a time, and 
genomes homozygous versus heterozygous for the detectable allele are indistinguished. A "codominant polymorphism" 
is a DNA difference at a locus between genomes whereby multiple alleles at the locus each can be distinguishable 
so even when in heterozygous combinations. Typically identifiable as mobility variants on electrophoretic gels, codominant 
polymorphisms can produce additive, nonparental genotypes when present in heterozygous form. A dominant poly- 
morphism is most useful as a marker when it is in coupling with the trait it marks, whereas a codominant polymorphism 
is equally useful when in coupling or in repulsion to a trait. 

The term "touchdown amplification" or "touchdown PGR" will refer to a specific thermocycling profile for the 
55 polymerase chain reaction whereby the annealing temperature begins artificially high (or low) for the first few cycles, 
then is incrementally lowered (or raised) for a specified number of successive cycles until a final, desirable annealing 
temperature Is reached. The remaining cyies of the multiple cycle profile are then performed at this final, touchdown 
annealing temperature. Thermocycling using this strategy serves to reduce or circumvent spurious, nonspecific priming 
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during the initial stages of gene amplification, and imbalance between correct and spurious annealing is automatically 
minimized. 

The terms "hot start' and "cold start" will refer to a general choice of methodologies for initiating thermocycling for 
a PGR amplification reaction. In a cold start amplification, all the reaction components are assembled simultaneously 
5 at room temperature, prior to the first denaturation step. This approach allows for the possibility of spurious priming 
and nonspecific amplification products resulting from primer annealing and primer extension at undesirably low tem- 
peratures. In contrast, a hot start approach employs the deliberate omission of at least one key component from the 
otherwise complete amplification reaction, thus preventing either primer annealing (if primer is omitted) or extension 
(if either polymerase or nucleotides are omitted). After carrying out an initial high-temperature denaturation, the ex- 

10 eluded component is added, and thermocycling proceeds. A hot start amplification thus serves to reduce or eliminate 
the production of nonspecific products that result from spurious primer extension at nonstringent temperatures. 

'Nucleic acid" refers to a molecule which can be single stranded or double stranded comprised of monomers 
(nucleotides) containing a sugar, phosphate and either a purine or pyrimidine. In bacteria, lower eukaryotes, and in 
higher animals and plants, "deoxyribonucleic acid' (DNA) refers to the genetic material while "ribonucleic acid" (RNA) 

IS is involved in the translation of the information from DNA into proteins. 

The terms "genomic DNA" or "target DNA" or "target nucleic acid" will be used interchangably and refer to nucleic 
acid fragments targeted for amplification or replication and subsequent analysis by the instant method for the presence 
of SSR regions. Sources of genomic DNA will typically be isolated from eukaryotic organisms. Genomic DNA is am- 
plified via standard replication procedures using suitable primers to produce detectable primer extension products. 

20 The term 'restriction endonuclease" or "restriction enzyme" is an enzyme that recognizes a specific palindromic- 

base sequence (target site) in a double-stranded DNA molecule, and catalyzes the cleavage of both strands of the 
DNA molecule at a particular base in every target site. 

The term "restriction fragments" refers to the DNA molecules produced by digestion with a restriction endonuclease. 
Any given genome may be digested by a particular restriction endonuclease into a discrete set of restriction fragments. 

2S The DNA fragments that result from restriction endonuclease cleavage may be separated by gel electrophoresis and 
detected, for example, by either fluorescence or autoradiography 

The term "restriction fragment length polymorph ism (RFLP)" refers to differences in the genomic DNA of two closely 
related organisms which are detected based upon differences in the pattern of restriction fragments generated by a 
restriction endonuclease digestion of genomic DNA of the organisms. For example, a genome which contains a poly- 

30 morphism in the target site for a restriction endonuclease will not be cleaved at that point by the restriction endonucle- 
ase. Or, a nucleotide sequence variation may introduce a novel target site where none exists in the other organism, 
causing the DNA to be cut by the restriction enzyme at that point. Additionally, insertions or deletions of nucleotides 
occurring between two target sites for a restriction endonuclease in the genome of one organism will modify the distance 
between those target sites. Thus, digestion of the two organism's DNA will produce restriction fragments having different 

35 lengths and will generate a different pattern upon gel electrophoresis. 

The term "ligation" refers to the enzymatic reaction catalyzed by the enzyme T4 DNA ligase by which two double- 
stranded DNA molecules are covalently joined together in their sugar-phosphate backbones via phosphodiester bonds. 
Ligation can occur between two DNA molecules that each are bounded by blunt (nonstaggered) ends, but also can 
occur if the two DNA molecules contain single-stranded overhanging ends that are complementary in sequence. In 

40 general, both DNA strands of the two double helices are covalently joined together such that at each junction the free 
5' end of one of the DNA molecules carries a 5'-phosphate group. It is also possible to prevent the ligation of one of 
the two strands, through chemical or enzymatic modification (for example, removal of the 5" phosphate) of one of the 
ends, in which case the covalent joining would occur in only one of the two DNA strands. 

The term "adaptor" will specifically refer herein to short, largely double stranded DNA molecules comprised of a 

45 limited numberof base pairs, e.g., 10to30bp. Adaptors are comprised of two synthetic single-stranded oligonucleotides 
having nucleotide sequences that are not intentionally represented by repetitive sequences in the genome of interest, 
and also are in part complementary to each other. Under appropriate annealing conditions, the two complementary 
synthetic oligonucleotides will form a partially double-stranded structure in solution. At least one of the ends of the 
adaptor molecule is designed so that it is complementary to and can be specifically ligated to the digested end of a 

50 restriction fragment. 

The term "polymerase chain reaction" or "PGR' refers to the enzymatic reaction in which copies of DNA fragments 
are synthesized from a substrate DNA in vitro (U.S. Pat. Nos. 4,683,202 and 4,683,195). The reaction involves the 
use of one or more oligonucleotide primers, each of which is complementary to nucleotide sequences flanking a target 
segment in the substrate DNA. A thermostable DNA polymerase catalyzes the incorporation of nucleotides into the 
55 newly synthesized DNA molecules which serve as templates for continuing rounds of amplification. 

The term "DNA amplification" or "nucleic acid amplification" or "nucleic acid replication" or "primer extension" refers 
to any method known in the art that results in the linear or exponential replication of nucleic acid molecules that are 
copies of a substrate DNA molecule. 



9 



EP 0 804 618 B1 



The term "primer" refers to a DNA segment that serves as the initiation point or site for the replication of DNA 
strands. Primers generally will be single-stranded and will be complementary to at least one strand of the target or 
substrate nucleic acid and will serve to direct nucleotide polymerization or primer extension using the targeted sequence 
as a template. Primers may be used in combination with another primer to "flank" the target sequence in PGR, thus 

5 forming a "primer set" or "primer pair". In general, primers are 14 to 40 nucleotides long and preferably are designed 
so as not to form secondary structure or hairpin configurations. Specific requirements for primer size, base sequence, 
complementarity and target interaction are discussed in the primer section of the detailed description of the invention. 
The term "primer", as such, is used generally herein by Applicants to encompass any synthetic or naturally occurring 
oligonucleotide that can hydrogen-bond specifically to a region of a substrate DNA molecule and functions to initiate 

10 the nucleic acid replication or primer extension process; such processes may include, for example, PGR, or other 
enzymatic reactions that employ single rather than multiple oligonucleotide initiators. 

The term "anchor" or anchor region" or "anchor portion" refers to a 3-20 nucleotide region of a primer designed to 
hybridize with a DNA sequence which is immediately adjacent to a specified sequence SSR. The anchor region of a 
primer may occur at either the extreme 5* or 3' end, and serves to affix the primer onto the target DNA at an adjacent 

IS position relative to a specified SSR. This anchoring results in primer extension occurring from a fixed nucleotide at 
each target site. The anchor sequence can be a nondegenerate sequence of either deliberate or arbitrary design, or 
it can be a fully or partially degenerate sequence. The latter would be capable of annealing to the genomic DNA 
sequences flanking a wide range of SSR sites in a genome. Optionally, the anchor portion of the primer may be 5' end- 
labeled with a reporter molecule, typically a radioisotope, a fluorescent moiety or a reactive ligand. 

20 The use of the term "arbitrary" when speaking of an individual nucleotide at each position in a DNA sequence 

refers to selection based on or determined by unbiased means or seemingly by chance rather than by necessity or by 
adherence to a predetermined sequence. 

The term "non-degenerate" refers to the occurrence of a single, specified nucleotide type at a particular position 
or at multiple specified positions in the linear ordering of nucleotides in a DNA polymer, usually an oligonucleotide or 

2S a polynucleotide. Any nondegenerate nucleotide position can carry an intended base (either A, G, G or T) that is known 
for example to correspond to a given template site, or it can carry an arbitrarily chosen base, which will correspond to 
a target site that is not known a priori A "non-degenerate oligonucleotide" means that every nucleotide position within 
the DNA molecule is non-degenerate. The term "degenerate" refers to the occurrence of more than one specified 
nucleotide type at a particular position in an oligonucleotide or polynucleotide, A specific oligonucleotide can be made 

30 up of some positions that are degenerate and some positions that are fully or partially degenerate. "Fully degenerate" 
indicates the presence of an equal mixture of the four possible nucleotide bases (A, G, G or T) at a particular nucleotide 
position; partially degenerate indicates the presence of only two or three of the four possible bases at a particular 
position. A "degenerate oligonucleotide" is one in which at least one position within It carries full or partial degeneracy; 
such an oligo- or polynucleotide is a mixture of specific, nondegenerate DNA molecules, each of which represents a 

35 single permutation of the nucleotide sequences possible by virtue of the degenerate base(s) specified in the linear 
nucleotide sequence. An oligonucleotide with two fully degenerate positions, for example, would be a mixture of (4)^=16 
different nondegenerate molecules; an oligonucleotide with four fully degenerate and two partially degenerate (three 
bases) positions would comprise a mixture of (4)^ x (3)^=576 different non-degenerate molecules. Standard degeneracy 
codes used herein are: 

40 



NorX 


A. G. G or T 


B 


G, G or T [anything except A] 


D 


A. G or T [anything except G] 


H 


A, G or T [anything except G] 


V 


A, G or G [anything except T] 



The term "reporter" or "reporter molecule" refers to any moiety capable of being detected via enzymatic means, 
immunological means or energy emission; including, but not limited to, fluorescent molecules, radioactive tags, light 
emitting moieties or immunoreactive or affinity reactive ligands. 

The term "binding pair" includes any of the class of specific inter-molecular or recognition immune-type binding 
pairs, such as antigen/antibody or hapten/anti-hapten systems; and also any of the class of nonimmune-type binding 
pairs, such as biotin/avidin; biotin/streptavidin; folic acid/folate binding protein; complementary nucleic acid segments; 
protein A or G/immunoglobulins; and binding pairs which form covalent bonds, such as sulfhydryl reactive groups 
including maleimides and haloacetyl derivatives, and amine reactive groups such as isothiocyanates, succinimidyl 
esters and sulfonyl halides. 

The present invention describes the design and use of self -anchoring primers for the detection of SSR genetic 
markers. The general polymorphism detection method using these primers is termed selective amplification of micro- 



10 



EP 0 804 618 B1 



satellite polymorphic loci (SAMPL). The method of primer design is based on the observation that many compound 
SSR sequences are composed of dinucleotide repeats wherein a single type of nucleotide is shared by both of the 
directly adjacent constituent repeats and is maintained "in-phase" across the repeat junction and throughout the length 
of the repeat. The present invention combines many of the advantages inherent to conventional, singte-locus SSR 

5 markers (i.e., high levels of polymorphs and high codominance potential), with the added benefits and convenience 
offered by multiplexed genome assays. As with conventional single-locus SSR markers, the use of perfect compound 
SSR sequences as self-anchored PGR primers enables the identification of dominant and codominant polymorphisms 
between genomes. However, unlike conventional SSR analysis, this method requires no prior knowledge of the unique 
sequences flanking individual SSR loci. Thus, no labor-intensive SSR marker discovery or locus identification is nec- 

10 essary to use such compound SSR sequences as primers as we describe. Also, as with conventional SSR markers, 
the in-phase subset of compound SSR sequences appears to be highly abundant and well dispersed in both plant and 
animal genomes, as well as to be highly polymorphic between individual genomes. In contrast, the "out-of-phase" 
subset of perfect compound SSR sequences are represented in these same genomes at much lower relative frequen- 
cies. Therefore, the likelihood that the highly abundant, in-phase perfect compound SSR sequences can identify new 

15 polymorphisms closely linked to loci of interest is extremely high. Additionally, PGR primers representing these com- 
pound SSR sequences are self -anchoring, such that the 5'-most repeat serves as the anchor for primer extension by 
the 3' -most of the two repeats, thus obviating the need to incorporate into these primers any additional degenerate or 
fixed sequences as 5' or 3' flanking anchors. Therefore, conceivably every genomic locus harboring the same com- 
pou nd SSR sequence would be expected to serve as a target site for simultaneous amplification by the single compound 

20 SSR primer matching these target sites. Finally, and most preferred, the use of these primers can be incorporated into 
several different genome assays to increase their versatility and informativity. For example, the use of these perfect, 
in-phase compound SSR primers in modifications of the amplified fragment length polymorphism assay (AFLP; Zabeau, 
EP 534,858) leads to an increase in the proportion of amplification products that are polymorphic and codominant 
between even highly related genomes as compared to conventional AFLP methods. The versatility of these compound 

25 SSR primers, in combination with the ability to fine-tune both the numbers and types of amplification products achiev- 
able with the AFLP assay, offers a unique combination of benefits for the multiplexed analysis of complex plant and 
animal genomes. 

Applicants' modified AFLP invention is illustrated in Figure 1 . Genomic DNA (I) carrying perfect in-phase compound 
SSR sequences at some frequency of occurrence is digested with a single restriction enzyme, or with a combination 
30 of two or more restriction enzymes. This Figure demonstrates the use of a double enzyme combination, one enzyme 
having a hexanucleotide recognition site (hatched boxes) and the other a tetranucleotide site (dark boxes). It is also 
within the scope of the invention to choose other combinations of multiple restriction enzymes, including but not limited 
to the following combinations of restriction enzymes with the specified types (lengths) of recognition site: 



Additional Enzyme combinations 


two enzymes 


three enzymes 


4 + 4 


4 + 4 + 4 


5 + 5 


5 + 4 + 4 


6 + 6 


6 + 4 + 4 


4 + 5 


5 + 4 + 5 


5 + 6 


6 + 4 + 5 


4 + 8 


6 + 4 + 6 


5 + 8 


5 + 5 + 5 




5 + 5 + 6 




6 + 6 + 6 



In all cases, these multiple enzyme digestions produce a mixture of restriction fragments with all combinations of the 
so corresponding blunt or single stranded overhanging ends. Additionally it is within the scope of the invention for a single 
restriction enzyme to be used to produce fragments all sharing the same blunt or single-stranded overhanging ends. 
In general, any enzyme with a 4-, 5-, 6- or 8-bp recognition site is suitable, providing the enzyme's activity is not affected 
or inhibited by DNA methylation or other, nonmendelian modes of DNA modification within the enzyme site. 

Next, double stranded adaptors A and B are constructed wherein Adaptor A anneals specifically to the single 
55 stranded overiiang produced by the hexanucleotide-site enzyme, and Adaptor B to the overhang left by the tetranu- 
cleotide-site enzyme. Adaptors A and B are simultaneously ligated to the appropriate ends of all the restriction frag- 
ments (I I) in the digested DNA mixture using standard methods (also described in Table 1 1 1). I n an alternate embodiment, 
either Adaptor A or Adaptor B may be conjugated with a member of a binding pair such as biotin (as part of a biotin- 
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streptavidin pair), allowing capture and isolation of a smaller subset of the genomic restriction fragments (III). Either 
of the adaptors can be so modified; the proportion of genomic fragments then selected for is affected by the genomic 
fequency of the specific restriction site recognized by the modified adaptor For any given restriction enzyme combi- 
nation, this enrichment method categorically allows only a small fraction of the total number of restriction fragments 

s from a genome the opportunity to serye as template for the subsequent PGR amplifiction; however, this reduced com- 
plexity is necessary for ensuring a manageable number of co-amplified products in the next step. The entire genome 
can be examined through the use of multiple combinations of restriction enzymes for the generation of different sets 
of enriched genomic fragments. Figure la illustrates a biotinylated hexanucleotide site adaptor 

Alternatively, the complexity of the genomic fragment mixture can be selectively reduced by performing a pre- 

10 amplification step prior to the final amplification, using a pair of unlabeled adaptor-directed primers. This primer pair 
comprises two different primers, one corresponding to one adaptor sequence and the second primer to the other 
adaptor sequence. Each primer carries one selective nucleotide at its 3'-end. Since the 3'-most position of each of 
these +1 adaptor primers can be occupied by any one of the four DNA nucleotides, each adaptor can be represented 
by 4 different primers. Furthermore, 4x4=16 different combinations of these +1 primers can be used against any ge- 

15 nomic fragment mixture to generate 16 different, nonoverlapping fragment subsets from each genome. Thus, the pre- 
amplification enriches for a subset of the total mixture of genomic fragments, and different enriched subsets can be 
generated from a single restriction fragment mixture by varying the specific primers used for the pre-amplification. 

No matter how the fragment enrichment is performed, a pair of PGR primers next are synthesized according to 
standard protocols. One primer will be an adaptor-directed primer, designed to anneal to adaptor B specifically when 

20 the biotin-mediated fragment enrichment method is employed. For enrichment via pre-amplification, this primer can 
correspond to either of the two adaptors. In either case, this adaptor primer carries 1 -4 randomly selected nucleotides 
at their 3*-ends. 

The other primer will be a SSR-directed primer, designed to anneal specifically to a particular SSR sequence 
represented on a subset of the genomic fragments. In a preferred embodiment, the SSR-directed primer is 5'-end 

25 labeled, typically with a radionucleotide such as ^^p or or with a fluorescent moiety (*). It is further especially 
preferred if the SSR-directed primer is of the "perfect compound" type wherein the primer straddles the compound 
SSR, preferably with one nucleotide remaining "in-phase" across the length of the primer Exponential amplification of 
a subset of the adaptor modified restriction fragments in the presence of the adaptor-directed primer and the 5'-labeled 
SSR primer generates labeled primer extension products (IV) from every input genomic fragment that carries the SSR 

30 sequence and is bordered at the opposing end by the designated adaptor sequence. 

This method generates multiple co-amplification products, a high proportion of which are expected to be polymor- 
phic between genomes. Those genomic fragments lacking either the designated SSR sequence or the appropriate 
adaptor end, or both, will not be exponentially amplified, and therefore will not be detected (V). 

35 GENERAL METHODS: 

Primer Design: 

All oligonucleotides primers are synthesized using solid-phase phosphoramidite chemistry such as that described 
40 by Operon Technologies, Alameda, G A. All primers are non-phosphorylated at their 5'ends and can be used in unpu rifled 
form providing efficient syntheses. However, oligonucleotides purified by column chromatography are preferred for 
optimal primer specificity. All oligonucleotide primer sequences are chosen such that the T^ in 50 mM salt (KGI or 
NaGl) is between 38° and 45*G (as determined using the algorithm employed by Oligo v4.03forthe Macintosh, National 
Biosciences, Inc.). 

4S Several types of primers are used within the context of Applicants' invention. The first is an adaptor-directed primer 

(as disclosed in Zabeau EP 534,858) which can vary from 1 5 to 25 nucleotides in length, and from 40% and 60% G+C. 
Starting at its 5' end, the primer spans the length of, and is complementary to. one strand of a double-stranded adaptor 
that is tigated to the restriction endonuclease-digested target DNA to be tested. The primer then covers all or part of 
the restriction site, and its 3' end can carry arbitrary, nondegenerate bases that anneal to and prime from nucleotides 

50 within the target DNA fragment adjacent to the adaptor The sequence of such an adaptor-directed primer can vary, 
depending upon the specific adaptor used for the construction of the template and upon the number of arbitrary, se- 
lective nucleotides positioned at its 3'-end. Examples of DNA sequences and characteristics of oligonucleotides com- 
prising both adaptor and adaptor-directed primers, each specific to the site generated by a particular restriction enzyme, 
are given in but not limited to Table 1. 

55 
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45 



A second type of primer used within the present invention corresponds in its 3' portion to a simple sequence repeat, 
50 or microsatellite, where the structure of the microsatellite is simple (simple SSR) as defined above. The simple micro- 
satellite region can be tandem repeats of mono-, di-, tri-, tetra- or pentanucleotides. The 5' position of the primer 
contains 3 to 5 fully or partially degenerate nucleotides, which serve to anchor the primer adjacent to a microsatellite 
in the targeted genome. This primer can vary in length from 10 to 60 nucleotides, with a [G+C] content typically from 
16% to 80%. Length polymorphism within a microsatellite tocus between genomes is expected to be detectable using 
ss these primers, since in nearly all cases, primer extension is expected to initiate from a fixed site relative to the SSR 
target region. Primers similar to these are described by Zietkiewicz, E., et al., Genomics, 20, 176, (1994). 

Another type of SSR-directed primer, uniquely utilized within the present invention, is the perfect compound SSR 
primer which is comprised of two different perfect SSR's that are immediately adjacent to one another with no inter- 



14 



EP 0 804 618 B1 



vening nucleotides either between the repeats or within each of the repeats. Perfect compound SSR primers are per- 
fectly setf-anchoring; that is, the simple SSR at the 5'-end of the primer serves as an efficient anchor for the adjacent 
3* SSR from which primer extension proceeds (see Figure 1c), It is intended, therefore, that primer extension initiates 
from a single, fixed site within a compound SSR target region. Any length variation between genomes in the portion 

s of the target SSR across which primer extension occurs should be visible as length variation in the resulting amplification 
products from those genomes. The relative lengths of each constituent SSR within the compound primer can vary. 
However, Applicants have found that the best primer anchoring and greatest specificity for the target template is pro- 
duced when the length of the 5' anchor is equal to or greater than that of the 3' priming portion. 

For dinucleotide repeats. Applicants theorize that, excluding CG and GC combinations (long stretches of which 

10 are thought to be rare in eukaryotic genomes), 90 different permutations of two adjacent dinucleotide sequences are 
possible. As estimated by the following equation. 



[(4)(3)-2]x[[(4)(3).2-1] = 90, 

IS 

the first (5' -most) constituent repeat may carry of any of the four nucleotides (A, G, C or T) in its first position, followed 
by any of the three remaining nucleotides at its second position, then from this product should be subtracted the two 
GC and CG combinations. For the second (3'-most) dinucleotide, the same calculation holds, but with the additional 
subtraction of the one combination occupying the first constituent dinucleotide repeat position. All of these 90 permu- 
20 tations are listed in Table I L Only 80 are true compound repeat sequences, however; 1 0 of the permutations are actually 
imperfect simple repeats (e.g., (CT)n(TC)n. (AG)n(GA)n, etc.). Similar calculations can be performed to estimate the 
number and types of different tri-, tetra- and penta-nucleotide combinations possible by random nucleotide arrange- 
ments. 

25 
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The complete GenBank DNA sequence databases (version 84.0) were searched using the FindPatterns search 
algorithm within the University of Wisconsin Genetics Computer Group sequence analysis package (version 7.3). The 
individual strands of the double stranded sequences shown in the first column were used individually as queries either 
against the entire GenBank database (all species combined) or against the separate subdatabases representing the 

s indicated phylogenetic groupings. For each query, such as (AC)3((AT)y, x and y were each designated to be >6; that is, 
any hit in the database was required to carry at least 6 units of each of the two constituent dinucleotide repeats. The 
final column designates the number of matches to the respective query within an in-tab collection of cloned soybean 
SSR sequences (unpublished data), isolated as small inserts containing either (AC)^ or (AG)y sequences. 

Although all 80 compound dinucleotide sequences have the potential to exist in a genome and to serve as targets 

10 for corresponding primers, only a small subset of these adjacent dinucleotide repeat combinations has been observed 
to occur at a reasonable frequency among compound SSRs represented within the DNA sequence databases and 
among SSRs cloned and sequenced from plant and animal genomes (see Table II). The great majority of these 80 
total combinations occur at surprisingly low relative frequencies in eukaryotic genomes. 

Nearly all of the compound repeats in this high frequency subset have perfect nucleotide periodicity whereby the 

is two adjacent constituent dinucleotide repeats share a common nucleotide that retains a constant periodic spacing 
across the junction and the entire compound structure. These compound SSR sequences are designated by Applicants 
as "in-phase", and include repeats such as (AT)x(AG)y, (AC)j((TC)y, and (AT)j((GT)y where x and y independently are 
^ 2 and can be multiples of 0.5. Thirty-two of the 80 possible dinucleotide combinations fall into this in-phase category, 
and these are listed in Table II. All the remaining compound repeats are termed, "out-of -phase". The 32 individual in- 

20 phase compound sequences (excluding CG and GO), however, represent only 1 6 unique nonredundant single-stranded 
sequences. The 2-fold redundancy derives from the possibility of positioning the repeating in-phase nucleotide as 
eitherthe first or the second base in the core repeats (i.e., (AC)j((AT)y and (CA)j((TA)y, although nonidenticalas individual 
primers, representthe same compound sequence). In addition, these 16 canonical, nonredundant sequences represent 
the complementary strands of only 8 individual doubte^tranded compound repeat loci (see Table II). In other words, 

2S a compound dinucleotide repeat as a locus in double stranded DNA could be recognized by any of four different single 
stranded oligonucleotide primers, out of the total set of 32 possible permutations. For example, a locus (AT)e(AG)e, is 
a target site for the four hypothetical primers, (AT)x(AG)y (TA)j((GA)y (CT)j((AT)y, and (TG)j((TA)y (with x, y ^ 6). 

By chance atone, and assuming no nucleotide bias in a source genome, each of the 80 different perfect compound 
dinucleotide permutations would be expected to occur in the genome at equal frequencies. As mentioned above, how- 

30 ever, it was already discovered by Applicants that the in-phase subset are more abundant compared to the out-of - 
phase set. Further, the two possible permutations of the constituent repeats for some of the in-phase compound se- 
quence combinations appear to be represented at widely differing frequencies in a given genome. For example, the 
compound repeat, (AT)<e (AG)<e appears to be at least 5 times more abundant in plant genomes than its permutant 
counterpart, (AG)<6 (AT)<6; and, (AC)<e (AG)^ is much more abundant in primate genomes than (AG)^ (AG)<6. Both 

35 from a systematic analysis of cloned sequence databases and from an empirical examination of both plant and animal 
genomes, these few, most frequently occurring compound dinucleotide repeats are known by Applicants and therefore 
are fully predictable. This knowledge serves to reduce to only a few the number of different compound SSR primers 
that will be successful for producing an adequate number of SSR-to-adaptor co-amplification products using the present 
invention. 

40 Thus, of the 80 total compound dinucleotide sequence permutations that are possible by random arrangement of 

nucleotides, only a few (nearly all of which are in-phase) are present in plant and animal genomes at a measurable 
frequency, and only these few, therefore, are required to detect a large proportion of the compound SSR loci present 
in any given genome. Experimental data demonstrate, however, that specific primers representing these 1 6 compound 
sequences are not equally effective at recognizing and priming from the respective target locus. Such differences in 

45 primer efficiency were determined empirically to result from each constituent repeat's base composition (AT-richness), 
in combination with the relative position (5' anchoring versus 3' -priming) of each constituent repeat within the primer. 

Table 11 also lists the dinucleotide permutations for which the spacing of a shared nucleotide is not preserved 
(termed, "out-of-phase") or for which no nucleotide is shared between the two constituent repeats. The latter category 
contains dinucleotide combinations that are both palindromic and nonpalindromic. Although not nearly as frequent in 

so eukaryotic genomes as the in-phase sequences, some of the out-of-phase compound SSR sequences nonetheless 
appear to be present in most genomes. Therefore, primers that correspond to such out-of-phase repeats also are 
expected to serve as initiation sites for primer-extension from their respective target loci in the genome. Preferred 
anchored primers of the instant invention where primer nucleotide periodicity is not specifically designated may be 
defined by formula I for dinucleotide repeats: 

55 
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Formula I 

5'-(XY)<,5 (NM)^,5-3' 

5 

where 

X = A, C, T, or G ; Y = A, C, T. or G ; X_Y 
N = A, C, T, or G ; M = A, C, T, or G ; N_M 

10 

and where XY ^ NM 

Herein, these are abbreviated as (XY) #(NM)# and by formula II for trinucleotide repeats: 
Formula II 

IS 

5'-(XYZ)^io (LMP)^,o -3' 

where 

20 

X = A,C,T,or G; Y = A.C.T.or G; Z=A,C,T, or G 
and X, Y, Z are not the same single base; 

where 

2$ 

M = A,C,T,or G; N = A.C.T.or G; P=A.C,T,or G 
and M, N, P are not the same single base; 

and where XYZ^NMP 

30 Generally the primers of formulae I and II will consist of oligonucleotides of 10-60 nucleotides in length that contain 

two different, constituent simple sequence repeats that are directly adjacent to one another, with no intervening non- 
repeat nucleotides. From the 5* end, this oligonucleotide contains a simple sequence repeat of up to 15 repeat units 
in length, followed immediately 3' by a second simple sequence repeat that Is also up to 15 repeat units in length. 
Preferred anchored primers of the instant invention where primer nucleotides are specifically designated to be in- 

35 phase may be defined by the formula III for dinucleotide repeats: 

Formula III: 

This formula describes a subset of sequences covered broadly by Formula I. 

40 

5'- (XY)<i5 (XZ)<,5 -3' or 5'- (YX)<^5 (ZX)^^^ -3' 

where 

45 

X = A,C.T.or G; Y = A,C,T,or G; Z =A, CT.or G 

but where Y5tX;Z^X;andY9tZ 
Herein, these are abbreviated as (XY)#(XZ)# or (YX)#(ZX)#. 
so Typically oligonucleotides of formula III are 1 0-60 nucleotides in length and contain two different, constituent simple 

sequence repeats that are directly adjacent to one another, with no intervening nonrepeat nucleotides. From the 5' 
end, this oligonucleotide contains a simple sequence repeat of up to 15 repeat units in length, followed immediately 3' 
by a second simple sequence repeat that is also up to 15 repeats. Each repeat shares a common nucleotide, which 
is retained at a consistent periodicity across both constituent repeats, occupying either the first or the second position 
55 within each repeat. 

In an especially preferred embodiment of Applicants' invention, PGR amplification to detect polymorphisms is 
carried out using restriction fragmented DNA modified with appropriate adaptors, wherein the primer pair used is com- 
prised of one primer which is of the first type described above (Zabeau EP 534,858; an AFLP primer) and the second 
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primer is one of Applicants' unique perfect compound SSR primers described above. 

One of skill in the art will also appreciate that Applicants' unique compound SSR primers can also be used in 
conjunction with a variety of other primer types which include for example, non-adaptor primers, primers of fixed se- 
quence, arbitrary primers or any phmer that might hybridize with currently known or unknown dispersed repeated 

s sequences in the genome. An example particularly suited to the present invention would be where the other primer is 
of completely arbitrary sequence such as a RAPD primer (Williams et a!.. Nucleic Acids Res. 18, 6531, (1990)). 

RAPD primers have been used to generate polymorphic markers from the amplification of genomic DNA. Preferably 
the nucleotide sequence of the RAPD primers would be about 9 to 10 bases in length, between 50 and 80% G+C in 
composition and contain no palindromic sequences. Amplifications using RAPD primers alone are typically done using 

10 short primers and low annealing temperatures which maximizes the probability that several randomly distributed loci 
on the genome will produce amplification products. Because the incidence of any particular RAPD binding site within 
the genome is relatively low, this methodology would serve to restrict the amount of the genome that is subject to 
amplification with any particular primer combination. It is contemplated that the selectivity of the RAPD methodology 
would serve an enrichment function, similar to the selection function provided by use of a biotinylated adaptor in the 

IS conventional AFLP method to enrich for only a subset of randomly distributed genomic regions, from which a manage- 
able number of co-amplification could then occur, 

PREPARATION OF GENOMIC DNA: 

20 Restriction digested fragments: 

Target DNA useful for amplification in the present invention was comprised of restriction fragments generated from 
Taq I + Pst 1 or Taq I + Htndlll digestion of eukaryotic genomic DNA, further modified by the ligation of specific adaptor 
sequences. Genomic DNA was isolated from soybean and corn using either the CTAB/chloroform extraction and CsCI/ 

2S centrifugation method of Murray and Thompson (Murry et al.. Nuc. Acid Res., 8, 4321, 1980) or a urea extraction 
miniprep procedure (Chen etal.. The Maize Handbook . M. Freeling and V. Walbot, eds., (1993) pp 526-527, New York). 
Mammalian and salmon genomic DNAs were purchased from commercial sources (Sigma (St. Louis, MO); Clontech 
(Palo Alto, CA)). Genomic DNA was prepared for amplification reactions by complete restriction endonuclease digestion 
followed by ligation of site-specific double-stranded adaptors. Methods for this type of adaptor design and construction 

30 are well known in the art. and examples are given by Zabeau, EP 534,858. 

It is preferred if a combination of two different restriction enzymes having 4-bp and 6-bp recognition sites, respec- 
tively, are used for the preparation of target DNA. Examples of suitable restriction enzymes are Taq I and Pst I however, 
any restriction enzymes having 4-. 5-. 6- or even 8-bp recognition sites also are appropriate providing their activities 
are not inhibited by target site DNA methylation or other nonmendetian mechanisms of selective nucleotide modifica- 

3S tion. Any combination of such restriction enzymes are potentially suitable. Other restriction enzymes suitable for the 
present invention may include but are not limited to the hexanucleotide site enzymes. EcoRI, Dral or BamHI, the 
tetranucleotide stie enzymes. Sau 3AI, Mbol, Msel, Tsp509l or Alul, the pentanuleotide site enzymes Hinfl or Avail, 
and the octanucleotide site enzymes, Pmel, Pad, or Swal. 

Genomic DNA is digested with a first enzyme such as Taq I, followed by further digestion with a second enzyme 

40 such as Pst I, according to standard protocols, such as that given by Zabeau (EP 534,858). The digestions generated 
from each Input genomic DNA are a mixture of symmetric fragments, bordered at both ends by either Taq I or Pst I 
sites, and asymmetric fragments, each flanked by a Taq I site and a Pst I site. 

Adaptors : 

45 

Double stranded adaptors are generated by annealing the two partially complementary single stranded component 
oligonucleotides of each pair (examples are listed in Table I). Since restriction endonucleases cleave genomic DNA 
molecules at specific sites, amplification of restriction fragments can be achieved by first ligating synthetic oligonucle- 
otide adaptors to the ends of restriction fragments, thus providing all restriction fragments with two common flanking 

50 tags which will serve as anchor bases for the primers used in PGR amplification. Typically, restriction enzymes either 
produce flush ends on a DNA fragment, such that the terminal nucleotides of both strands are base paired, or generate 
staggered ends in which one of the two strands protrudes to give a short (1-4 nt) single strand extension. In the case 
of restriction fragments with flush ends, adaptors are used with one flush end. In the case of restriction fragments with 
staggered ends, adaptors are used that have a single stranded extension complementary to the single stranded ex- 

ss tension on the restriction fragment. Consequently, each type of restrictbn site end is specifically recognized by a par- 
ticular adaptor by virtue of the complementarity of the matched ends. In addition, the DNA sequence of the entire length 
of each adaptor type differs from that of other adaptor types (see Table I). Typically, the adaptors used are comprised 
of synthetic single-stranded oligonucleotides which are in part complementary to each other, and which are usually 
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approximately 10 to 30 nucleotides long, preferably 12 to 22 nucleotides long, and which form double stranded struc- 
tures when mixed together in solution. Using the enzyme T4 DNA ligase, the adaptors are joined covalently and spe- 
cifically to the complementary ends of individual DNA molecules in the mixture of restriction fragments generated from 
a particular genomic DNA source. Using a large molar excess of adaptors over restriction fragments ensures that all 

5 restriction fragments will receive adaptors at both ends. These adaptors are not usually phosphorylated. These ligated 
adaptors then serve as templates for the adaptor-directed PGR primers. 

In one embodiment of the invention, all restriction fragments from the genome carry the same adaptor at both 
ends, and a single PGR primer corresponding to that adaptor sequence can be used to amplify simultaneously from 
the fragments. The simultaneous amplification of several different restriction fragments is often referred to as multiplex 

10 PGR amplification. Since in such a case all restriction fragments are bordered at both ends by the same adaptor, it is 
obvious that primer extension and PGR amplification of a mixture of tagged restriction fragments will amplify all restric- 
tion fragments in a synchronous fashion. In another embodiment using two or more different restriction enzymes to 
cleave the DNA, two or more different adaptors are ligated to the ends of the restriction fragments. In this case, two 
different PGR primers, each matching the sequence of a particular adaptor, can be used for exponential amplification 

15 from a subset of the restriction fragments. In one preferred embodiment using two or more restriction enzymes, both 
adaptors are unmodified, and the fragment mixture is enriched using a pre-amplification step. In another preferred 
embodiment using two or more restriction enzymes, the adaptor corresponding to one of the restriction enzyme site 
ends is covalently linked to a biotin molecule. Using standard methods for isolating biotinylated molecules, this design 
allows for the selection, from a complex mixture of restriction fragments, of only those bordered on one or both ends 

20 by a biotinylated adaptor. Both of the two possible selection steps reduces the complexity of the starting mixture of 
restriction fragments and constitutes an enrichment step prior to the PGR amplification, thereby reducing in certain 
instances the background of fragments with same-site ends. In yet another embodiment one of the amplification primers 
may be radiolabeled for identification of the products via autoradiogrpahy or may be modified with fluorescent tags for 
fluorescence detection of products. Methods of labeling nucleic acids and suitable labels are well known in the art (see 

25 Sambrook supra). For example a radioisotope suitable in the present invention is 33phosphate, incorporated at the 5' 
end of one strand of the adaptor by a phosphate group transfer from of [y-^^PJATP under kinasing conditions. 

Double stranded adaptors (with or without biotin labels) are generated by annealing the two partially complementary 
single stranded component oligonucleotides of each pair (listed in Table I). For example.the double stranded Taq I 
adaptor (Taq-Ad) is produced by combining the single-stranded Taq. AdF and Taq. AdR oligonucleotides under favorable 

30 annealing conditions. To generate the biotinylated double stranded Pst I adaptor (biotin-Pst-Ad), the single stranded 
oligonucleotides biotin-Pst. AdF and Pst.AdR similarly are combined. If restriction enzymes other than those listed in 
Table I are used, then the adaptors must be designed to carry the appropriate protruding single-stranded-ends for a 
given restriction 1 /2-site. In a preferred embodiment, each adaptor contains a single base alteration within the restriction 
half-site it carries, so that the reconstructed site generated by each ligation event cannot be re<Jigested. Therefore, 

35 the artisan will appreciate that it is possible that restriction digestion and ligation can be performed simultaneously 
under the appropriate buffer and temperature conditions. 

DNA AMPLIFICATION: 

40 Although the basic protocols for the amplification of nucleic acids are well known in this art, significant modifications 

of those protocols were necessary in order to achieve optimal amplification of DNA fragments and detection of poly- 
morphic products. Several factors were found to significantly influence these amplifications, including variation in the 
thermocycling parameter of the PGR protocols, variation in the labeling of the primers, and the length and nucleotide 
composition of the primers. 

45 

Thermocycling variation in PGR: 

The efficiency of amplification by both 5'-anchored simple and compound SSR primers was tested on soybean, 
corn, and mammalian templates, using thermocycling profiles having either constant temperature or touchdown an- 

50 nealing conditions. Both the adaptors and the SSR-dlrected primers in these amplifications were designed to have 
T^'s within a relatively narrow range (38-45*'G in 50 mM Na or K salt), so that any primer pair chosen for an amplification 
would have approximately the same optimal annealing temperature. 

Three different constant annealing temperatures, 52''G, 58'*G and 60**G were tested using a standard 3-step per 
cycle protocol. Although the results from each test varied from the others (the higher the temperature, the fewer the 

55 products generated), it is clear that the efficiency of primer discrimination at most target loci was found to be unaccept- 
ably inefficient at any of these constant annealing temperatures. Every "product" from these amplifications was repre- 
sented by a small family of bands, and the products within each family differed by multiples of two; the length of each 
dinucleotide repeat. However, this "stuttering" effect and potential nonspecific product formation was minimized when 
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a touchdown thermocycling protocol (Don, et al., Nuc Acids Res, 19, 4008 (1 991 )) was used. In touchdown amplification, 
the annealing temperature begins deliberately high, then is incrementally lowered in successive cycles, down to a 
desired, "touchdown" annealing temperature. Touchdown temperatures of SQ'^C, 5e**C and SS'C were tested for some 
primer combinations. For most SSR-directed primers, the 56°C final touchdown conditions produced the greatest 

5 number of specific, non-stuttering bands. For the purpose of the present invention, therefore, it is most preferred if 
nucleic acid amplification be conducted according to a touchdown protocol where 56°C is the optimal final annealing 
temperature. However, depending on the actual composition of the primers, a particular amplification may involve final 
annealing temperatures of 55'*C-60'*C. 

Another variable in the thermocycling protocol that was explored is the method by which the amplification reactions 

10 are initiated. Typical PGR amplification protocols initiate with a cold start; all reagents necessary for DNA amplification, 
are present in the reaction mixture prior to the first denaturation step. In contrast, a hot start protocol calls for the 
exclusion of one key reaction component, typically either the primer, nucleotides or polymerase, from the mixture during 
reaction setup and the first denaturation. This component then is added following denaturation, and primer extension 
can proceed. 

The cold start protocol allows for the possibility that primers will anneal under nonstringent conditions both to 
template sites that are not necessarily a perfect match, and to multiple, staggered sites within a target locus. Often 
this method leads to a stuttering effect of the amplification products on the gel. In contrast a hot start protocol prevents 
spurious primer annealing to incorrect template sites at ambient temperatures prior to the first denaturation, and gen- 
erates products that resolve more sharply and discretely on the gel. When othenwise identical amplification reactions 

20 were performed using the cold start and hot start protocols, it was found that for nearly every 5' -anchored simple SSR- 
directed primer, a cold start produced unacceptably indistinct products on the gel. Much sharper products were gen- 
erated using hot start. In contrast, product resolution was found to be more consistent between cold and hot start 
methods when using compound SSR primers. In spite of the potential drawbacks in product resolution, cold start 
amplification was easier to perform, particularly when processing large numbers of samples, and was routinely found 

25 to be sufficient to generate amplification products that could be distinguished as polymorphic between genomes. There- 
fore, a slight gain in product resolution is sacrificed in a cold-start protocol in exchange for greater speed and ease of 
reaction setup. If 5' -anchored simple SSR primers are used, a hot-start is preferred, but a cold-start is adequate and 
sufficient for amplification reactions involving most compound SSR-directed primers. 

30 Choice of primer labeling: 

The complementary strands of a duplex DNA molecule usually resolve independently to slightly different positions 
on a denaturing polyacrylamide gel. If both strands of a DNA duplex are radiolabeled, then the autoradiograph will 
show a separate band representing each strand of each amplification product. Resolution of only a single band for 

35 each amplification reaction product on the denaturing polyacrylamide gel requires that only one strand of each product 
be labeled. To achieve this, only one of the two primers in any given pair used for an amplification reaction should be 
labeled. Either 32p or can be used as radiolabels, although ^^p images are generally sharper on an autoradiograph. 
Attematively. a variety of different fluorophores can be incorporated into the primer; the resulting products can then be 
detected using a fluorescence detection system. 

40 The effects of radiolabeling the SSR primer, in comparison to the alternate labeling of the adaptor^iirected primer, 

were explored. In all protocol variations, radiolabeling the adaptor-directed primer resulted in significant background. 
Generally, the lanes on the gel contained a few discrete bands, but the major result in every case was a smear of 
products distributed along the entire length of the lane. In contrast, when only the SSR-directed primer was labeled in 
the amplification reactions, the products were much more discrete on the autoradiograph and not associated with any 

45 significant lane background. This difference likely results from the high abundance of adaptor target sites on a large 
proportion of the template fragments (from which even linear amplifications with labeled primers collectively will lead 
to significant backgrounds). In contrast, the labelled SSR primers have far fewer target sites in the template mixture, 
and most become part of a productive, exponential amplification. Hence SMabeling of only the SSR primer is preferred, 
and would apply to labeling with either radioactive or fluorescent tags. 

so 

Variations in SSR and adaptor primer design: 

Variability in the SSR-to-adaptor amplification reaction, and therefore in the products obtained, results not only 
from the reaction and thermocycle setup conditions described above but also from subtleties in the design of the primers 
55 used in these amplifications. Once a particular compound SSR has been chosen as the target locus sequence for this 
assay, then either partially or entirely different sets of amplification products can still be controlled by altering any one 
of the following primer design criteria: 
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a) the number and base composition of the 3'-extension nucleotide(s) on the adaptor-directed primer; 

b) the relative lengths of the two constituent simple repeats that comprise the compound SSR primer; 

c) the particular strand of the double-stranded compound SSR locus chosen to correspond to the single-stranded 
primer (i.e., the directionality of the primer); 

5 d) choice of restriction enzymes and SSR targets for a particular genome. 

a) Length of the adaptor-directed primer: 

An adaptor-directed amplification primer which corresponds to the sequence of one of the synthetic adaptors 
10 ligated to the restricted ends of the genomic DNA can carry a variable number of arbitrary sequence nucleotides (zero 
to ten) at its 3'-end. These variable 3'-nuc!eotides on the primer anneal specifically to sequences that are directly 
adjacent to the adaptor and restriction site and whose sequences are not known, a priori, on any particular genomic 
restriction fragment. The recognition of each such primer to only a subset of all possible fragments in the template 
mixture provides exquisite specificity in the amplification reaction (Zabeau, EP 534,858). Such primers, othenwise 
IS identical in sequence except for differences in the few 3'-most nucleotide(s), can amplify completely nonoverlapping 
sets of amplification products and behave much like allele-specific amplification primers (Newton et al.. (1989) Nua 
Acids Res 17: 2503; Kwok et al.. (1990) Nuc. Acids Res, 18: 999; Wu et al.. (1989) Proc. Natl. Acad. Sci. USA 86: 
2757). The key difference, however, is that use of these adaptor-directed primers requires no prior sequence knowledge 
of the genomic locus to be amplified, and each primer will selectively co-recognize multiple target sites in a template 
20 DNA mixture. 

In general, the longer the variable 3'-extension, the more selective or restrictive the primer. This 3'-extension con- 
tains arbitrary, nondegenerate or partially degenerate bases, which restrict annealing of the primer to only a subset of 
the total number of potential target sites, thus leading to a reduction in the real number of co-amplified products. The 
addition of each nondegenerate nucleotide onto the 3'-extension leads hypothetically to 4-fold greater template dis- 

25 crimination. In addition, different single nucleotides at the 3'-most base position(s) confer unique template specificities 
to otherwise identical primers. Thus, varying both the number and composition of the 3'-selective nucleotides on the 
adaptor-directed primer is sufficient to generate individual, either partially or completely, nonoveriapping sets of am- 
plification products from the same template when paired with a given SSR-directed primer. The choice of which 3'- 
extension to use for a particular amplification is largely a matter of chance, but still will depend largely upon relative 

30 nucleotide frequencies in a target genome and upon the abundance in the genome of the specific SSR that serves as 
the other priming site. 

b^ Relative lengths of the two constituent simple repeats comprising the compound SSR primer: 

35 Every simple and compound SSR locus in the genome is a double stranded structure whose individual strands 

carry different permutations of nucleotides. Unless the core dinucleotides of a compound repeat are palindromic (e.g., 
(CA)j((TG)y or (AG)x(CT)y), a single-stranded primer that may specifically anneal to one strand at a particular SSR 
locus will not anneal to the opposite strand. None of the in-phase compound SSRs is palindromic and only 8 of the 90 
possible dinucleotide permutations represents such a palindrome. Therefore, all in-phase and most out-of-phase SSR- 

40 directed primers will primer-extend from each genomic target locus in a polar, unidirectional manner, and any compound 
SSR locus can be recognized and amplified from by any of four primer classes. For example, the compound in-phase 
SSR locus, 

. - 5 • -CACACACACACACACACACACATATATATATATATATATA- 3 ' SEQ ID NO . : 1 



3 ' -GTGTGTGTGTGTGTGTGTGTGTATATATATATATATATAT-5 * , SEQ ID NO - : 2 

50 can be recognized by four different canonical primer classes: 
5'-(AC)x(AT)y-3'. 
5'-(CA),(TA)y-3'. 
5'-(AT),(GT)y-3*. 

and 5'-(TA)j((TG)y-3', where the 5'-most repeat in each serves primarily to anchor the primer to the template, and 
55 the 3'-most repeat serves a primer-extension function (see Figure 1c). 

Each of these four canonical primer classes can include a wide range of individual primers, all differing by the 
length of the two constituent repeats within the primer. Changes in the lengths of these constituent repeats have pro- 
found effects on primer efficacy and the fidelity of reproducible amplifications. In general, the tonger the 5'-anchoring 
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repeat (i.e., the value of x, above) relative to that of the 3'iDriming repeat (the value of y), the better the primer's 
specificity and priming efficiency in the amplification. In addition, only a primer with a short 3' repeat will allow amplifi- 
cation from compound SSR loci containing very short downstream repeats, 
compound SSR loci containing very short downstream repeats. 

5 

c) Polaritv of the single stranded compound SSR-directed primer: 

The choice of which strand of a double-stranded compound SSR locus to use as a primer can be extremely critical 
for determining the success of the SSR-to-adaptor amplification reaction. It should be noted that the only type of (AT)- 
10 containing primer that will lead to efficiently generated amplification products under standard conditions Is one in which 
the (AT)n sequence is very short (1.5-3 repeat units) and is situated as the 3'-primer extension end. An (AT)^ repeat 
of any length at the 5' end is completely inefficient as an anchor, and results in little or no amplification from a complex 
genomic template mixture. 

IS d) Restriction site frequencies, nucleotide bias. SSR frequencies, and primer design considerations: 

Ligation products carrying biotinylated adaptors may be selected out of each digestion/ligation mixture using a 
streptavidin or avidin coated support such as paramagnetic beads, as provided by Dynal Inc., (Lake Success, NY). 
This selected DNA does not have to be purified further from the beads for the subsequent amplifications. In one em- 

20 bodiment where two restriction enzymes are used and only the adaptor corresponding to the restriction enzyme with 
the hexanucleotide site is biotinylated, the selected DNA is a mixture of fragments bordered only by one or the other 
restriction site or flanked at each end by different sites. For example, if Taq I and Pst I are used, then only the Pst I 
adaptor is biotinylated. Following biotin selection, the Taq l-Pst I and Pst l-Pst I fragments are predicted to be present 
in the enriched fragment mixture at an approximate ratio of 30:1 , respectively. All Taq l-Taq I fragments are effectively 

25 discarded. Methods for such a calculation will be apparent to one skilled in the art, for example: if the frequency of 
each nucleotide is known for the specific genome, then the symmetric and asymmetric restriction fragments will be 
present in the digestion mixture at predictable proportions. In general, a calculation can be made that derives from the 
following assumptions: 

First, recognition sites for each restriction enzyme are present in the genome at differing absolute frequencies, 
30 which are a function of the number of nucleotides in the site and of the genome's nucleotide composition. Second, 
these absolute frequencies can be converted to relative frequencies, p and q, since the sum of the relative frequencies 
(p+q) is always equal to 1 . For example (considering equal nucleotide frequencies and random nucleotide distribution 
in a genome): 



site 


absolute frequency 


relative frequency 


TaqI 
PstI 


(0.25)4 = 3.9x10-3 
(0.25)6 = 2.44x10-4 


p = 0.9412 
q = 0.0589 



40 Finally, the relative frequencies of restriction fragments bordered by these sites are simply the products of the relative 
site frequencies for each fragment type. Therefore: 



fragment 


relative frequency 


Taq l-Taq 1 
Pst l-Pst 1 

Taq l-Pst 1 and Pst l-Taq 1 


p2 = 0.8862 
q2 = 0.0035 
2pq =0.1109 



Therefore in this embodiment, utilizing restriction enzymes with 4- and 6-bp recognition sites and assuming no 
so nucleotide bias, the biotin-selected DNA fragments represent only 1 1 . 1 % of the genome. Different restriction enzymes 
with different site frequencies will lead to a greater or lesser proportion of the genome represented in the mixture of 
selected fragments. Using several restriction enzyme combinations will ensure better coverage of the genome than 
just a single enzyme combination. 

The selected DNA fragments may be used as a pooled template mixture for polymerase chain reaction amplifica- 
55 tions using one each of a primer corresponding to one of the adaptors (the one that was not biotin-selected) and a 
primer directed to a particular 5'-anchored simple or compound SSR sequence. One skilled in the art will appreciate 
that a detectable product will result from any single genomic template fragment only when exponential amplification 
occurs between the adaptor-directed primer and an oppositely oriented SSR-directed primer (see Figure la). Multiple 
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amplification products are expected from each template DNA mixture since the SSR and adaptor sequences are not 
single-copy sites. The multiplex ratio (the number of co-amplified products) of each amplification reaction is affected 
by the absolute genomic copy number of a specific SSR sequence, and can be adjusted experimentally tor a given 
SSR by altering either the level of degeneracy of a simple SSR primer's 5*nanchor or the number and quality of the 

s nondegenerate selective nucleotides at the 3" end of the adaptor-directed primer. For example, assuming equal fre- 
quencies for all four nucleotides In the genome, the addition of each successive nondegenerate nucleotide onto the 
3'-end of the adaptor-directed primer leads to a 4-fold reduction in the number of co-amplified products from a given 
template mixture (see Figure lb). Therefore, an adaptor primer carrying zero selective nucleotides at its 3' end (eg., 
Taq.AdF; see Table I) will co-amplifiy 4 times as many templates as will a primer with a single, nondegenerate 3'- 

10 nucleotide (e.g., Taq.prS, whose 3' -extension is -A). Similarly, this primer will co-amplify 4 times as many template 
fragments as a primer carrying 2 selective nucleotides, and so on. In general, the degree of selectivity of the adaptor- 
directed primer can be estimated using the formula, 1/42", where n=the number of selective bases. It should be cau- 
tioned that although convenient, this simplified calculation does not take into account the base composition of the 
genome, nor of the recognition sites of the restriction endonucleases used to produce the genomic fragments. 

IS 1 . Although most DNA fragments In the bead-selected or pre-amplifled mixture should be bordered at one end by the 
adaptor corresponding to the adaptor-directed primer to be used in the PGR, only a subset of these fragments are 
expected to carry an internal simple sequence repeat region complementary to a particular SSR primer. Thus, ampli- 
fication products will be generated and detected only from the subset of target molecules that not only are flanked by 
the primer-specific adaptor but also contain an internal repeat sequence matching the SSR primer It should be noted 

20 that absolute frequencies for the different repeats can vary wkiely within a species and are not accurately known for 
most plant genomes. Preliminary studies indicate that in soybean. (AT)^ Is at least twice as abundant as (CT)^, which 
in turn appears to be somewhat more frequent than (CA)n (Morgante & Olivieri, Plant J 3. 175 (1992); Akkaya et al., 
(1992) Genetics, 132:1131). In general, it is estimated that one SSR longer than 20bp exists in plant genomes once 
every 23-29kb, compared to a figure of 6kb in mammals (Vteng et al. . Theor Applied Genetics: 88, 1 (1 994); Morgante 

2S & Olivieri, P/anf J (1993); Beckmann & Weber Genomics 12.627 (1992)). Frequencies of compound SSR sequences, 
however, have not been documented in the literature. 

A completely degenerate 5'-anchor on a simple SSR primer should prime from every locus in the genome that 
carries that particular SSR sequence. Any degree of nondegeneracy introduced into the anchor will reduce the potential 
number of genomic target sites, and therefore the number of amplified fragments. In a genome with no nucleotide bias. 

30 the complexity of the co-amplified products Is reduced by a factor of 4 for every anchor position that is assigned a 
nondegenerate nucleotide. Each self-anchoring compound SSR primer is expected to anneal and prime from every 
matching compound SSR locus in the biotin-selected fragment mixture (providing each target SSR locus has a sufficient 
length to allow complete hybridization by the primer). 

35 Detection of polymorphisms between phenotypically related indivkiuals: 

Individual gel banding pattern differences of the co-amplified fragments between different templates (i.e., different 
genomes) indicate polymorphisms between the source genomes. The amplification products generated with any of 
the compound SSR-directed primers are a mixture of polymorphic and nonpolymorphic fragments. Compared to a 

40 conventional AFLP reaction (EP 0534858), from which most of the polymorphisms detected are dominant, Applicants' 
compound SSR-to-adaptor multiplexed amplification method generates a greater proportion of codominant polymor- 
phisms. Although many of the amplification products generated by this scheme are nonpolymorphic between closely 
related strains, the proportion of polymorphic products increases between more distantly related lines. In general, 
genomic polymorphisms can be detected using the SSR-to-adaptor multiplexed amplification among individuals from 

45 within a species as well as between species; the greater the evolutionary distance between the genomes being com- 
pared, the more polymorphisms expected. Both dominant and codominant polymorphisms can be detected. In either 
case, a polymorphism revealed by this method may result from any one or a combination of possible causes: 

1) One or both restriction sites bordering a given genomic region are missing in one genome (analogous to an 
so RFLP, but here detected by a different method). This may be visible either as a dominant or a codominant difference 

between genomes; 

2) Insertion or deletion differences exist between genomes, within the genomic fragment bordered by common 
restriction sites (this should be visible as a codominant polymorphism providing the amplification distance Is not 
too great for either allelic fragment); 

ss 3) Length differences in the simple sequence repeat between genomes can lead to codominant polymorphic am- 

plification products, generally differing in length by multiples of the repeat unit; 

4) Single base differences are present between genomes in the region Immediately adjacent to the restriction site, 
such that the 3'-selective portion of the adaptor-directed primer can discriminate between dissimilar templates, in 
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a manner analogous to an allele-specific amplification. 

Because of all these potential sources of polymorphism, the information content on a per locus basis for this type 
of multiplexed amplification assay is very high. A simple estimate can be made for the minimum number of nucleotide 
s positions at a locus that are informative (i e.. at which a polymorphism may be detected). For templates digested with 
both a 4- and 6-bp cutter 

4 (within the 4 bp restriction site) 
+ 6 (within the 6 bp restriction site) 

+ 0 to 10 (sequence immediately adjacent to the adaptor-specific restriction site) 
10 +0to30 (number of nucleotides within the SSR assayable for length variability between genomes) 

= 10 to 50 nucleotides per amplified locus may be informative for producing a polymorphism between individual 
genomes. The first three factors in this sum result from single nucleotide variation (e.g., substitutions) between ge- 
nomes, whereas the fourth factor In the sum results from repeat length variation. Although small insertions and deletions 
distributed in the entire genome can contribution to the detection of length variability in this as well as other genome 
IS assays, the greatly Increased probability for repeat length variation at each SSR target locus results in an "above 
background" level of length polymorphism detectable in the products. 

In comparison, the information content for RFLP is 8-12, for RAPDs is 16-18, and for a standard AFLP assay Is 
generally 1 0-1 5 nucleotides. Furthermore, compared to conventional AFLP and R APD technologies, a larger proportion 
of the polymorphism assayable by this SSR-to-adaptor amplification method Is detectable as codominant differences 
20 between genomes. 

SSR-based polymorphisms detected using this SSR-to-adaptor, or SAMPL, amplification method can be converted 
into more conventional and convenient single-locus SSR markers. This conversion can be performed, for example, if 
the multiplexed approach is used to quickly screen through hundreds or thousands of possible polymorphisms between 
genomes, and if it then Is desirable to subsequently assay a chosen subset of these polymorphisms either at a larger, 

2S more high througput scale or in order to examine polymorphism at these particular loci more quickly and nonisotopically. 
This conversion process requires that the desired band be excised from the SAMPL gel and then sequenced. From 
the nucleotide sequence deduced for the unique sequence flanking the SSR, a locus-specific' primer can be designed, 
which flanks and is oppositely oriented towards the SSR. This unique primer can then be paired with a general adaptor- 
directed primer and used to amplify from the original fragment mixture. The resulting adaptor-to-unique primer PGR 

30 product then can be sequenced to discover the other unique flanking sequence of the SSR, and the second tocus- 
speciflc primer can be designed. Finally, the two oppositely oriented locus-speclfic flanking primers are used as a pair 
to amplify the region spanning the desired SSR locus in a target genome. 

EXAMPLES 

35 

MATERIALS AND METHODS 

Restriction enzymes, ligases and polymerases used in the following examples were obtained from BRL Life Tech- 
nologies (Gaithersburg, MD) or New England Biolabs (Beverly, MA). 

40 The source of the soybean cultivars. Bonus and soja PI 81762, was Theodore Hymowitz, University of Illinois. All 

other soybean lines, including the G. max cultivars wolverine, NOIR-1, N85-2176, Harrow, CNS, Manchu, Mandarin, 
Mukden, Richland, Roanoke. Tokyo and PI 54-60, and the G. soja accession PI 440-91 3, were obtained from the USDA 
Soybean Germplasm Collection, University of Illinois (Dept. Agronomy, Turner Hall, Urbana, IL). The source of the Z 
mays inbred cultivars, B73 and Mo17, and the elite lines, LH82, LH119 and LH204, was Holden's Foundation Seeds, 

45 Williamsburg, lA. The source of the Z. mays inbred line, CM37, was Benjamin Burr, Brookhaven National Laboratory, 
Upton, NY. The AEC272 and ASKC28 Z. mays lines were obtained from Dr. Denton Alexander, University of Illinois. 
Genomic DNA from five different human sources, as well as from salmon and mouse BABL/c, was purchased from 
commercial sources (Sigma, St. Louis, MO or Clontech, Palo Alto, CA). 

Reagents, buffers and protocols used for restriction digests, ligations, 5'-end phosphorylatlon-labeling of primers 

50 and PGR amplifications are given below in Table III. 
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Preparation of genomic template DNA mixtures: 

Genomic DNA was isolated from soybean (Glycine max) cultivars using a CTAB/cliloroform extraction and CsCI/ 
centritugation metliod (Murry et a!., Nuc Acids Res, 8, 4321. 1980), and trom com {Zea mays) cultivars using a urea 
5 extraction miniprep method (Chen et al., in The Maize Handbook ., M. Freeling and V. Walbot. eds., (1 993) pp 526-527 
New York). 

Purified genomic DNA was prepared for amplification reactions in a manner similar to that described by Zabeau 
(EP 534,858), by complete restriction endonuclease digestion followed by or coupled with ligation of srte-specific dou- 
ble-stranded adaptors. The restriction enzyme combinations used for the following examples were either Taq I + Pst I 

TO or Taq I + Hind III (a combination of enzymes with tetra- and hexa-nucleotide recognition sites, respectively). Between 
1 and 2.5 ug of high molecular weight genomic DNA was digested with 5 units/ug of Taq I in a 50 ul volume at 65*C 
for approximately 3 hours in a buffer containing 10 mM Tris acetate, 10 mM magnesium acetate, 50 mM potassium 
acetate. 5 mM dithiothreitol, pH 7.5. then digested further in the same buffer with 5 units/ug of Pst I or Hind III at 37*C 
for 3 h (Table Ml). The digestion products generated from each input genomic DNA were a mixture of symmetric frag- 

15 ments, bordered at both ends by either Taq I or Pst I (or Hind III) sites, and asymmetric fragments flanked by both a 
Taq I site and a hexanucleotide site. 

Double stranded adaptors were generated by slowly annealing equimolar amounts of the two partially comple- 
mentary single stranded component oligonucleotides of each pair (see Table III). The double stranded Taq I adaptor 
(Taq-Ad) at 50 pmole/uL was produced by combining 5000 pmole each of Taq.AdF and Taq.AdR single-stranded oli- 

20 gonucleotides with HgO to a final volume of 100 uL For the 5 pmole/ul Pst I and Hind III adaptors (biotin-Pst-Ad or 
biotin-Hind-Ad), 500 pmole each of the corresponding single stranded oligonucleotides for each were combined in a 
final volume of 100 uL. To generate the double-stranded molecules, all mixtures were incubated at sequentially de- 
creasing temperatures; 65'*C for 15 min, 37'C for 15 min, room temperature for 15 min, then finally at 4**C. 

This section describes the method that utilizes biotin-streptavidin selection for enriching the genomic fragment 

2S mixture prior to the SSR-to-adaptor amplification. To each completed double digestion was added 10 uL of a mixture 
containing 1 unit T4 DNA ligase, 0.2 mM ATP, and the two double-stranded adaptors, Taq-Ad (50 pmole) and biotin- 
Pst-Ad or biotin-Hind-Ad (5 pmole). each carrying a different synthetic DNA sequence along its length and each com- 
plementary at one end to the single-stranded tetranucleotide or hexanucleotide overhangs on the genomic fragments 
(Table III). These adaptor ligation reactions were Incubated at 37*C for 3 h. Each adaptor contained a single base 

30 alteration within the half-restriction site it carries, so that the reconstructed site generated by each ligation cannot be 
re-digested. Therefore, restriction enzyme digestions and ligations can be performed simultaneously providing the 
activities of all enzymes used share a common optimum reaction temperature. 

A subset of ligation products, all carrying a biotinylated Pst I adaptor at least one end, was selected out of each 
digestion/ligation mixture using streptavidin coated paramagnetic beads (Dynal, Lake Success. NY). For each selec- 
ts tion. 10 uL beads, washed once in 200 uL STEX (100 mM NaCI, 10 mM Tris-HCt, 1 mM EDTA, 0.1% Triton X-100, pH 
8.0) then resuspended in 150 uL STEX, was added to each ligation reaction, and the mixtures incubated for one hour 
at room temperature on a gently rocking platform. The DNA-adhered beads then were selected out of each mixture 
using a magnetic rack support; the supernatent was aspirated away, and the beads were resuspended in 200 uL STEX. 
Four additional cycles of bead selection, washing and aspiration were performed. The final resuspension was trans- 

40 f erred to a fresh tube, and the DNA-adhered beads selected in this final cycle were resuspended in 10 mM Tris-HCI, 
0.1 mM EDTA, pH 8.0 (100 uL for 1 ug input DNA or 200 uLfor 2-2.5 ug input DNA). This selected DNA. which did not 
have to be purified further from the beads, was a mixture of Taq l-Pst I (or Taq l-Hind III) and Pst l-Pst I (or Hind III- 
Hindlll) fragments, present at an approximate ratio of 30:1, respectively The selected DNA fragments were used as 
pooled template for polymerase chain reaction amplifications using one each of a Taq I adaptor-directed primer and a 

45 simple sequence repeat (SSR)-directed primer. 

For the alternative method of enriching the restriction fragment mixture prior to the final PGR amplification, both 
restriction site specific adaptors may be either biot in-modified or unmodified. Following adaptor ligation to the restriction 
fragments, the entire mixture is subjected to up to 16 individual amplification reactions, each of which employs a pair 
of Individual adaptor-directed primers. One primer of each pair corresponds to one of the adaptor sequences, and the 

50 second primer to the second adaptor sequence. In each case, the amplification primer carries a single, randomly 
chosen nucleotide at its 3' -most end. Each primer pair will specifically amplify an approximately 1/1 6th subset of the 
original genomic fragment mixture. Any or all of the pre-amplified product mixtures derived from a common restriction 
fragment population then can sen/e as an enriched template for the subsequent amplification between an SSR-directed 
primer and primer representing either of the two flanking adaptors. In this case, this adaptor primer typcially carries 2, 

55 3, or more arbitrary 3'-nucleotides. with the nucleotide closest to the I restriction site perfectly matching the one nucle- 
otide used on the pre-ampliftcation adaptor primer. 
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EXAMPLE 1 

AMPLIFICATION USING 5'-ANCHORED SIMPLE SSR PRIMERS IN A MULTIPLEXED SSR-TO-ADAPTOR 
AMPLIFICATION TO DETECT POLYMPORPHISMS AMONG SOYBEAN CULTIVARS 

Example 1 describes the use of primers corresponding to simple SSR sequences flanked at the 5' -end by degen- 
erate nucleotides, for the amplification of adaptor tagged restriction fragments and the subsequent detection of genetic 
polymorphisms among soybean cultivars. Adaptor-directed primers used in these amplifications are shown in Table I. 
The SSR primers, herein termed 5*-anchored simple SSR primers, are listed in Table IV. 

TAPLEIV 
5'.ANCHORED SIMPLE SSR PRIMERS 





Phmer Name 


Sequeiice(5'->3*)/SEQ ID NO. 


Length 


(SOmM Sail) 


%GC 




HBH(AO)8.5 


HBHAOAOAOAOAOAOAOGA/SEQ ID NO.:3 


20 


45.7'C 


45.00 




BHB(GA)8^ 


BHBOAGAGAGAOAGAGAOAO^EQ ID NO.:4 


20 


47.7*»C 


52.50 


20 


DVIXTC)8.5 


DVDTCTCTCTtTCrCTCrcr/SEQ ID N0»5 


20 


45.r»C 


47.50 




VDV(Cr)8.5 


vDVCTcrcrcrcrcrcrcTc/sEQ id no.:6 


20 


47.7»C 


52,50 




DBD(AC)7.5 


DBDACACACACACACACA/SEQ ID N0.:7 


18 


41.8'C 


47.20 


2S 


BDB<CA)7.5 


BDBCACACACACACACAC/SEQ ID N0.:8 


18 


44.1»C 


52.80 





HVH(TG)7.5 


HVHTGTGTGTGTGTGTGT/SEQ ID N0.:9 


18 


4I.8*C 


47.20 


30 


VHV(GT)7.5 


VHVCnxrrGTGTGTGTGTG/SEQ ID NO.: 10 


18 


44.1«C 


5180 




CCGGfDlO 


ccGOTrrrri'i'i 1 1 /seq id no.: 1 1 


14 


23.4«C 


28.60 




G€XK:(A)iO 


OCGCAAAAAAAAAA/SEQ ID NO.: 12 


14 


23.4-C 


28.60 


35 


BDBD(AC)6.5 


BDBDACACACACACACA/SEQ id no.: 13 


17 


39.5X 


47.00 


BHBH(AG)6J 


BHBHAGAGAGAOAGAGA/SEQ ID NO.:14 


17 


39.5*C 


47.00 




VHVH(TO)6.5 


VHYHTGrOTOTOTOTOT/SEQ ID N0.:15 


17 


39.5»C 


47.00 




CGG(CA)C6.5 


CXKJCACACACACACACA/SEQ ID NO.:16 


17 


44.3*C 


58.80 



40 



Adaptor modified, biotin-selected genomic DNA from the Glycine max strains, NOIR-1 , NB5-2176 and wolverine, 
and the Glycine soja cultivar, PI 81 762, were prepared as described in the MATERIALS AND METHODS. SSR primers 
were 5'-end labeled with 33p by combining fifty microcuries of [y-^apjATP (New England Nuclear) with 150 ng of primer 

45 and 5 units of T4 polynucleotide kinase in a 30 uL reaction (Table III). After incubation at 37**C for 1 h, a 1 uL aliquot 
of this labeled primer mixture (containing 5 ng primer) was used directly in each amplification reaction. 

All amplification reactions were performed simultaneously by cold start initiation and were set up together at room 
temperature, using a series of reaction cocktails. The master mixture, containing the four components common to all 
reactions, consisted of (per reaction) 2.0 uL of 10X PEC buffer, 0.8 uL all four dNTPs (5mM each), 13.0 uL H2O, and 

so 0.1 uL (0.5 units) Amplitaq DNA polymerase (Perkin Elmer Roche) (see Table III and MATERIALS AND METHODS). 
An aliquot of this master mixture was then added to the appropriate primers to make individual full-primer cocktails; 
for each final amplification reaction, 15.9 uL of master mix was combined with 0.6 uL unlabeled Taq I adaptor-directed 
primer (stock is 50 ng/uL), 0.5 uL unlabeled SSR primer, and 1 .0 uL (at 5 ng/uL) 33p-iabeled SSR primer. The appro- 
priate biotin-streptavidin selected template DNA (2 uL each) was distributed into 0.2 mL microamplification tubes (Rob- 

55 bins Scientific, Mountain View, CA) and placed in individual wells of a Perkin-Elmer 9600 multiwelt plate. Eighteen uL 
of the appropriate full-primer cocktail then was added, to give all reactions a final volume of 20 uL. This final combination 
of reaction components was quickly completed and the amplification reactions initiated on a Perkin Elmer 9600 ther- 
mocycler with as little delay as possible using either a constant 58'*C annealing or a 58'*C touchdown thermocycling 
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protocol: 



Touchdown annealing 


denature 


94°C, 3 min 


1 cycle 


94°C, 30 sec 




eS'^C. 30 sec 




72'*C, 1 min 


11 cycles 


94'*C, 30 sec 




64.4'*C, 30 sec for 1st cycle, 




then decrease by O.S^'C per cycle 




for the next 10 cycles to a final 




5B.4*'C 




72"C, 1 min 


23 cycles 


94'*C, 30 sec 




58'*C, 30 sec 




72*»C, 1 min 


Constant temperature annealing 


denature 


94'*C, 3 min 


35 cycles 


94**C, 30 sec 




58'*C, 30 sec 




72'C. 1 min 



The completed amplification reactions were diluted with an equal volume of formamide stop solution (98% deion- 
ized formamide, 2 mf^ EDTA, 0.05% bromophenol blue, 0.05% xylene cyanol), heated to 94'C for 3 min, then quickly 
chilled on ice. 2.5 uL of each was immediately loaded onto a 4.5% or a 6% denaturing 30 x 40 x 0.4 cm polyacrylamide 
gel (7Murea, 4.5% acrytamide: N-,N-methytenebis-acrylamide[19:1]. lOOmMTris-HCI, 80mM boric acid, 1 mMEDTA, 
pH 8.3) that was first pre-run in the same tris-borate-EDTA buffer at 55W for 30 minutes. The loaded samples were 
electrophoresed at 55W (corresponds to -1400-1 5007, 35-40mA) for two hours, then the gel was transferred to chro- 
matography paper, vacuum dried for 2 h at 80*C, and exposed to Hyperfi!m-MP (Amersham) or Biomax (Kodak) X- 
ray film with an intensifying screen at -70*0 for 2 to 7 days. 

Figure 2 shows a comparison of the amplification products using the 5'-anchored simple SSR-directed primers, 
DBD(AC)7 5, HBH(AG)8.s, and HVH(TG)7.5. In all cases, the SSR primer was ^^p-iabeled and used in combination with 
each of two different Taq I adaptor-directed primers, Taq.Ad.F and Taq.prS. which carry zero and one 3' selective 
nucleotide, respectively (see Table I). Panel a shows the amplification products generated using constant temperature 
thermocycling and panel b shows the products from a touchdown protocol. Several different 5'-anchored simple SSR- 
directed primers have been tested, all of which are listed in Table IV. In general, it was found that all such simple SSR- 
to-adaptor amplifications required a 0-2 nucleotide extension on the 3' end of the adaptor-directed primer to give a 
suitable number of co-amplified products. 

Regardless of the annealing temperature, the constant temperature anealing protocol produced bands on the gels 
that were extremely smeared and indistinct for nearly every primer combination tested. Raising the annealing temper- 
ature from 56** to 58'*-59''C resulted in somewhat less smeariness (Figure 2a); the products of individual loci generally 
are discernable. However, the products are not discretely sized, and instead showed a high degree of "stutter" on the 
gel. The highest constant annealing temperature tested, 60°C, produced relatively few bands for some primer combi- 
nations, and no bands for others (data not shown). These results indicate that the efficiency of primer discrimination 
at most target loci is relatively inefficient when the annealing temperature is held constant throughout the thermocycling. 
Either the primer does anneal, but at multiple positions within a target locus (producing a stuttering effect), or it does 
not anneal stably to generate a product. Although an optimal constant annealing temperature for any primer pair utli- 
matety should be determined empirically, it is likely that heterogeneously sized amplification products still will result 
using this thermocycling method. 

In contrast, thermocycling using touchdown conditions (Don,, etal.,, Nua Acid Res., 19, 4008, (1991) are designed 
to lead to highly efficient target locus discrimination and to minimize or eliminate spurious priming by either of the 
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primers in the amplification. In touchdown amplification, the annealing temperature begins deliberately high, then is 
incrementally lowered in successive cycles, down to a desired, "touchdown" annealing temperature. Touchdown tem- 
peratures of both SS^C and 56*0 were tested. For most SSR-directed primers, this range of final touchdown temper- 
atures was optimal for producing a large number of relatively discrete bands on the gels (see Figure 2b for an example 

5 of the products resulting from 58'C touchdown reactions; other data not shown), and these bands were reproducible 
from experiment to experiment. 

Individual banding pattern differences in the co-amplified fragments between different templates (i.e., different 
genomes) indicate polymorphisms between the source genomes. The majority of the amplification products generated 
by this scheme appear to be nonpolymorphic between the two closely related G. max strains, NOIR-1 and N85-21 76; 

10 however, a greater number of polymorphic products are seen between the more distantly related G. max, wolverine 
and G. soja PI 81762 cultivars. In addition, some polymorphisms appear to be dominant (a band amplified from one 
genome, but no apparent corresponding band from the other), and a few are potentially codominant (bands of similar 
but nonidentical size amplified from each genome). 

This analysis illustrates two important features of the ability to fine tune and customize this assay First, the two 

IS different 5*-anchored simple SSR directed primers. HBH(AG)8.5 and DBD(AC)7 5, when used against a common set of 
genomic templates, produced completely different amplification patterns. That these patterns reflect distinctly different 
subsets of amplification products is consistent with the idea (see Figure 1 ) that different subsets of restriction fragments 
from the genome are likely to carry different SSR target sites. Each band of the gel is the result of a productive ampli- 
fication between a particular SSR sequence oppositely oriented relative to a Taq I site. Given the estimation for relatively 

20 large spacing between SSR sequences of all types in the soybean genome (Vteng et al., Theor Applied Genetics, 88, 
1 (1 994); Morgante & Olivieri, Plant J 3, 1 75 (1 993)), it is unlikely that the SSR-to-Taq I site products derived from the 
HBH(AG)8.5 and DBD(AG)7 5 primers in this example cover any genomic loci in common, 

A second feature illustrated by this example is that for any chosen SSR primer, the fewer the number of 3'-selective 
nucleotides on the adaptor-directed primer, the greater the number of co-amplified bands. In general, it is expected 

2S that the mixture of products generated using the n=0 version of the adaptor primer is more complex than that for an 
n=1 primer, and this product mixture is more complex than that for an n=2 primer, and so on. In this example, the Taq 
I adaptor-directed primer, Taq.AdF carrying zero 3'-selective nucleotides, consistently produced a greater number of 
labeled reaction products when paired with a given SSR primer, compared to primers carrying one selective nucleotide 
(Taq.pr6 [n=1=A] or Taq.pr8 [n=1 =c]). While in theory, an increase in length of the 3'-extension from n=0 to n=1 should 

30 decrease the number of amplified fragments four-fold, it is difficult In practbe to quantitate the real difference, primarily 
because of the great degree of general smeariness and poor resolution of the products derived from 5'-anchored simple 
SSR primers. Clearly, however, the greatest number of bands, along with the highest levels of background, Is visible 
in the lanes representing reactions amplified with Taq.AdF. 

Third, thermocycling conditions employing touchdown conditions generally serve to reduce the smearing within 

35 the lanes, and generally makes for sharper product bands, in comparison to use of a constant annealing temperature. 
However, these sharper bands still are accompanied by a great degree of stutter, which hinders precise comparison 
of polymorphisms between lanes (genomes). 

In general, genomic polymorphisms can be detected among individuals from within a species as well as between 
species; the greater the evolutionary distance between the genomes being compared, the more polymorphisms are 

40 expected. Both dominant and codominant polymorphisms are revealed. However, no matter what the specific reaction 
thermocycling condition, the use of 5'-anchored simple SSR primers in an SSR-to-adaptor amplification is not ideal for 
identifying new polymorphisms. Even when the annealing conditions are carefully optimized, as in touchdown thermo- 
cycling or in a hot start initiation (not shown), the individual co-amplification products resolve on the gels as rather 
smeary and indistinct. The high degree of stutter apparent on the autoradiographic images counteracts the clarity of 

45 the bands attainable using conventional AFLP (Zabeau EP 534,858), and prevents accurate identification of all but the 
most prominent polymorphisms. 

EXAf^PLE 2 

so AMPLIFICATION USING PERFECT COMPOUND SSR PRIMERS IN A MULTIPLEXED SSR-TO-ADAPTOR 
AMPLIFICATION TO DETECT POLYMPORPHISMS AMONG SOYBEAN CULTIVARS 

Example 2 illustrates the use of perfect compound SSR-directed primers in an SSR-to-adaptor amplification meth- 
od similar to that discussed in Example 1 as a means to improve the resolution and increase the level of polymorphism 
ss among the multiplexed amplification products. All of the individual compound SSR primers used for these amplifications 
are listed in Table V. 
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Most of these compound SSR sequences are represented either on sequenced soybean genomic clones that were 
shown by hybridization to contain one of the dinucleotide repeats that comprise the compound SSR (M. Morgante and 
C. Andre, unpublished), or on cloned plant and animal sequences that have been entered into public DNA sequence 
databases (e.g., GenBank; see Table II). 

5 DNA templates were generated essentially as described in Example 1 , from the G. maxcultivars wolverine, NOtR- 

1, N85-2176. Harrow, CNS, Manchu, Mandarin, Mukden, Richland, Roanoke, Tokyo, PI54-60, and Bonus, as well as 
from G. soja accessions PI81762 and PI440.913. Following double digestion with either Taq I + Pst I or Taq I + Hind 
III, ligation of double stranded Taq l-Ad and biotin-Pst 1-Ad (or biotin-Hnd Ill-Ad) adaptors, and selection of fragments 
carrying at least one biotinylated Pst I or Hind III end, the amplification reactions were performed using a series of 

10 reaction cocktails and a cold start setup, all as described in Example 1 . In all cases, only the compound SSR primer 
was 5'-end labeled using [r^^PJATR The amplifications were performed on a Perkin Elmer 9600 thermocycler using 
a 56°C final annealing temperature touchdown profile: 



denature 


94'C, 3 min 


1 cycle 


94°C, 30 sec 




65*C, 30 sec 




72''C, 1 min 


11 cycles 


94'*C, 30 sec 




64, 3° C , 30 sec for 1st cycle, then 




decrease by O./'^C per cycle for the 




next 10 cycles to a final 56°C. 




72°C, 1 min 


23 cycles 


94'*C, 30 sec 




56°C, 30 sec 




72*C. 1 min 



Following electrophoresis on 6% denaturing polyacrylamide gels in tris-borate-EDTA buffer essentially as de- 
scribed in Example 1 , the co-amplified products were visualized by autoradiography after intensifying screen enhanced 
exposure at -70'C to Kodak Biomax X-ray film. 

All of the compound SSR-directed primers listed in Table V have been used in this protocol to detect polymorphisms 
among different soybean cuttivars. All primers used represent perfect compound SSR sequences in which one of the 
component nucleotides is 'in-phase" across the two adjacent dinucleotide repeats. Surprisingly, not all of the primers 
listed in Table V were equally effective at generating products, and not all generated products with the same degree 
of polymorphism, even when they were predicted to do so based upon cloned sequence compilations (Table II). The 
compound primer that produces the greatest number of co-amplified fragments from the soybean genome is (CA)^ 
(TA)y (where x and y are multiples of 0.5, but each ^1), Figure 3a shows the amplification products from a (CA)7 5 
(TA)2 5 version of this primer, used in combination with Taq, AdF (containing zero 3'-selective nucleotides) and Taq.prS 
(one 3'-nucleotide, -C). Also shown in Figure 3a are the products from the compound sequence primers (TC)4 5(TG)4,5 
and (CT)7 5(AT)3 5, which amplify only relatively few fragments even when the Taq I primer is completely nonselective. 
This result was surprising, since (CT)x(AT)y sequences appear to be the second-most abundant class of compound 
repeat on isolated soybean clones (see Table II). As primers, (CA)7 5(TA)2.5 and (CT)^ 5(AT)3 5 differ primarily in the 
length of their 3'-( AT)y repeat. The differing efficiencies of these two primers in othenwise identical amplification reactions 
may largely be a function of the length of the "leading" 3'-(AT)y sequence. In constrast, the extremely low number of 
amplified products resulting from {TC)4 5(TG)4 5 Is probably the result of low copy number of this compound repeat in 
the soybean genome (see Table II). Shown in Figure 3b are the products generated using (TG)4 5(AG)4 5, and (TG)4 5 
(AC)4 5, each In combination with the same two Taq I adaptor-specific primers. An intermediate number of products 
are amplified by each of these two compound SSR sequences. 

The amplification products generated with any of the perfect compound SSR<lirected primers are a mixture of 
polymorphic and nonpolymorphic fragments. For example, some of the products from the (CA)7 5(TA)2.5 primer are 
completely nonpolymorphic among all 1 5 different G. max and G. soja genotypes tested; however, many of the products 
reflect either dominant or codominant polymorphisms among these genomes. The products amplified using (CA^ 5 
(TA)2.5 in combination with either Taq.AdF(n=0) or Taq,pr8(n=1=c) from Figure 3a were cataloged: 
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10 



□dnu oUUloU do. 


AFLP^ 


SSR-to-adaptor amplification (CA); 5(TA)2.5 


+ TaqAdF** 


+ TaqPre 


monomorphic 


64 


12 


14 


dominant*^ 


34 


44 


40 


codominant^ 


1 


8 


8 


codominant + 


3 


6 


16 


dominant® 








TOTAL 


102 


72 


78 


% polymorphic products 


37% 


83% 


82% 


Expected heterozygosity^ 


0.32 


0.44 


0.43 



IS 



20 



two paired adaptor-directed primers corresponding to the Taq I adaptor and Pet I adaptor, respective ty (not shown) 

**The TaqAdF lanes carry a high level of bacltground, and only the most unambiguous polymorphisms were scored 

^'A band was scored as polymorphic in the indicated category if at least one of the 15 genotypes showed a diferenoe from the others 

**These scorings are very conservative, minimum estimates of the true incidence of codominant, allelic products; only the most obvious codominant 

relationships among the 15 template genotypes were scored. A more accurate measurement of the codominance frequency in these reactions 

requires analysb of the segregation of potentially allelic bands in a population derived from pairs of these genotypes 

®A band could be scored as both codominant and dominant if it appears to be completely absent from at least one genotype and if it also was 
represented by at least two size variants in bands that were amplified in other genotypes 

''Expected heterozygosity (H=1 -Ipj^,) for each band (locus) was calculated as the sum of the allele frequencies (pj) for that locus; an average of H 
for every polymorphic locus could then be calculated 



This set of amplification products does not represent the entirety of the (CA)>7 5(TA) ^2.5 loci in the soybean ge- 
nome. Within the biotin-selected subset of Taq I + Pst I double digested template fragments, amplification was limited 
to only those (CA)^7 5(TA)^ 5 genomic loci for which this SSR is within an amplifiable distance of and is oriented 
oppositely to a Taq I site, and for which no other Taq I site lies on the other side of the SSR, between the SSR and the 
"selectable" Pst I site (see Figure 1). The remaining (CA)^7 5(TA)^ 5 loci in the genome can be amplified from template 
DNAs constructed using different sets of restriction endonucleases. Figure 4 illustrates that when just one of the two 
restriction enzymes is changed (Pst I is replaced by Hind III as selectable enzyme site), the pattern of amplified frag- 
ments is markedly different from that produced with Pst I + Taq I. Therefore, generating templates restricted with differing 
combinations of restriction endonucleases, as well as assaying each template preparation with a large set of different 
compound SSR and adaptor primer combinations, should allow the detection of a greater proportion of the total number 
of polymorphic SSR loci in any given genome. 

In comparison to the low resolution of amplification products generated by SSR-to-adaptor amplification using 5'- 
anchored simple SSR primers (see Example 1 ), the altemative use of compound SSR primers allows for a much greater 
level of product resolution. Each band/product is accompanied by far less stutter, and the overall background In the 
lanes is noticeably reduced, even when a cold start amplification is used (compare Figure 2 to Figures 3a, 3b). This 
lower amount of background permits good discrimination of individual products, allowing the assignment of allelic 
relationships among bands to be made with a greater level of confidence. In some instances, a polymorphism might 
arise from single nucleotide variation within the region covered by the restriction site or the adaptor<lirected primer. 
Nevertheless, a number of the polymorphisms (most of those categorized as codominant or both codominant and 
dominant among the 15 cultivars tested) appear to arise from variation In the length of the compound SSR through 
which primer extension occurs; these are recognizable as codominant polymorphisms whose sizes differ by multiples 
of the unit length of the repeat. The ability to include repeat length variation as one type of polymorphism that is 
identifiable by this assay allows for a higher level of polymorphism to be visualized from each amplification reaction. 
It is possible that this assay will allow for the calculation of genetic distances, both within and between species, with 
great efficiency. Evidence for this comes from similarity estimates (not shown) made from the data shown in Figures 
3a and 3b; distances calculated for pairings of G. max-G. max genotypes from this data are less than the calculated 
distances for G. max-G. soja pairings. 

EXAMPLE 3 

AMPLIFICATION USING PERFECT COMPOUND SSR PRIMERS IN A MULTIPLEXED SSR-TO-ADAPTOR 
REACTION FOR GENETIC MAPPING 

Example 3 illustrates the use of the SSR-to-adaptor amplification method to generate genetic markers. To deter- 
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mine whether the polymorphic bands detected between G. max. Bonus and G. soja, PI 81 762 are genetically heritable, 
and to determine whether these polymorphisms could have utility tor genetic mapping, polymorphic products between 
these strains were scored and mapped to the soybean genome. 

Genomic template DNAs were made, as described in the MATERIALS AND METHODS, from 66 individuals at the 
s F2 generation from a cross involving the bonus and soja parents (from T Hymowttz, U. of Illinois). For this example, 
Taq I and Hind lit restriction endonucteases were used tor the template DN A double digestions. The digestions, adaptor 
ligations, and subsequent streptavidin-biotin selections were performed as described in MATERIALS AND METHODS. 
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The (CA)7 5(TA)2.5 compound SSR-directed primer (Table V) was 5'-end labeled using [y-^^PJATP and then used 
in combination with Taq.prS adaptor-directed primer (Table I) for multiplexed amplifications on the parent and F2 tem- 
plates. These amplifications utilized the cold start, 56°C touchdown thermocycling protocol detailed in Example 2. 
Amplification products were resolved on 6% denaturing polyacrytamide gels, then dried and exposed to Kodak Biomax 

5 x-ray film. The autoradiograph of these products is shown in Figure 5a. At least 15 amplification products were deci- 
sively polymorphic between the parental templates, and demonstrated mendelian segregation among the F2 individ- 
uals. Most of these segregate as dominant polymorphisms, and the probability that each segregated in the F2 progeny 
at a 3:1 or a 1:2:1 mendelian ratio, simply by chance alone, was consistently less than 5%. The parental inheritence 
of each polymorphic product was determined for each of the 66 F2 individuals, and the scores for 13 of these poly- 

10 morphisms are shown in Table VI. With this specific primer combination, most of the more unambiguous of the poly- 
morphic bands appear to segregate as dominant markers; only a few polymorphisms appear to segregate codominantly 
With other specific primer combinations, however, the incidence of codominant segregation is often higher. Most co- 
dominant polymorphisms were more problematic to score in that homozygotes sometimes could not be distinguished 
from heterozygotes. Therefore, each of the polymorphisms from this amplification were scored with a default assump- 

15 tion of dominance, and instances of true codominance were revealed following mapping. 

In order to map these bands to the soybean genome, these inheritence scores were correlated with those of 600 
RFLP and single-locus SSR markers previously mapped to the soybean genome (J. A. Rafalski and S. V Tingey, in: 
Genetic Maps: Locus Maps of Complex Genomes. 6th Edition. Cold Spring Harbor Laboratory Press, Cold Spring 
Harbor, NY, 1993). This standard genetic map (data not shown) was constructed by standard RFLP methodology from 

20 analysis of the segregation patterns of many RFLP markers in the same F2 population as used in this example (for 
standard RFLP mapping technology, see T Helentjaris, et at., 1986, Theor. AppL Genet, 72, 761 which reference is 
incorporated herein). The basis for genetic mapping analysis is that markers located near to each other in the genome 
are inherited together in the F2 progeny, while markers located farther apart are co-inherited less frequently. Segre- 
gation analysis and marker map positions were then calculated using a computer segregation analysis program, Map- 

25 Maker (E. 8. Lander, et al., 1987 Genomics, 1, 174) which had been modified by Applicants for the Macintosh. The 
results indicated that nearly every polymorphic amplification product with an approximate 3:1 dominant segregation 
ratio, or a 1 :2:1 codominant segregation ratio, among the F2 progeny can be mapped to the soybean genome. This is 
illustrated in Figure 5b, where 6 polymorphisms from Table VI all map to independent sites on various linkage groups. 
In total, the 15 polymorphisms mapped from this single primer combination are distributed among 10 different linkage 

30 groups; in some instances, two or more polymorphisms localize to linked sites on the same linkage group. The prob- 
ability that each of the polymorphisms in Figure 5b localize to these positions purely by chance from the observed data 
varied from 1 in 10^-29 to 1 in lO^^-S^, indicating the strength of these map positions. This example demonstrates that 
polymorphisms revealed by the present invention, SSR-to-adaptor multiplexed amplification, have utility as genetic 
markers. 

35 DNA was isolated and templates were made from each of 66 F2 Individuals segregating from a cross between 

PI81 762 and Bonus soybean lines. The genotype of each individual at each of the indicated marker loci was determined 
as follows: A score of "A" or "B" designates that the locus was inherited from the PI81 762 or Bonus parent, respectively 
A score of H designates that the locus was inherited from both PI81762 and Bonus. A scope of "a" designates that the 
locus was inherited only from P181 762 or that it was inherited from both Bonus and P181 762. A score of "b" designates 

40 that the locus was inherited only from Bonus or that it was inherited from both Bonus and PI81762. A score of "m" 
indicates missing data. 

EXAMPLE 4 

45 AMPLIFICATION USING PERFECT COMPOUND SSR PRIMERS IN A MULTIPLEXED SSR-TO-ADAPTOR 
REACTION TO DETECT POLYMORPHISMS IN OTHER PLANT AND ANIMAL GENOMES 

Example 4 illustrates the use of perfect compound SSR primers for the amplification of genomic DNA from other 
non -soybean genomes including corn, salmon, human and mouse. Genomic DNA was isolated from theZ mays inbred 

50 cultivars, B73, Mo17, ASKC28. LH82. LH119, LH204, AEC272, and CM37, using a urea extraction miniprep procedure 
(Chen et al., in: The Maize Handbook .. M. Freeling and V Walbot. eds., (1993) pp 526-527, New York). Genomic DNA 
from five different human sources, as well as from salmon and mouse (BALB/c) were purchased from commercial 
sources (Sigma, St. Louis, MO or Clontech, Palo Alto CA). All these DNAs were double digested either with Taq I + 
Pst I or Taq I + Hind Ml, the restriction fragments ligated to adaptors specific to these restriction sites, and biotin- 

55 streptavidin selections performed as described in Examples 1 and 2. 

Amplification reactions were performed as described in Example 2, using several individual perfect compound 
SSR-directed primers (33p-labeled), each primer in combination with indivkJual Taq I adaptor-directed primers carrying 
either zero or one 3'-selective nucleotide. The amplifications were performed using cold start, 56*C final touchdown 
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thermocycling conditions, and the labeled products resolved on 6% denaturing polyacrylamide gels. Examples of the 
amplification products from these plant and mammalian genomes are shown in Figure 6, panels a, b and c. 

In all genomes tested, every specific compound SSR and Taq 1 adaptor primer combination produced a distinct 
set of amplification products (a fingerprint). In general, the degree to which any two fingerprints are similar is a function 

5 of the evolutionary distance between the individuals. These SSR-to-adaptor amplification product fingerprints have 
two components that can differ: the absolute number of fragments and their collective pattem on the gel. First, a given 
SSR-directed primer may generate a competely different relative number of amplification products in one species 
compared to another species, indicating that the compound SSR locus amplified by that primer may be present in 
entirely different copy numbers in one phylogenetic group compared to another. For example, the number of co-am- 

10 plified fragments using the compound repeat, (CA)7.5(TA)2.5 appears to be much greater (by at least 2-5 fold) in the 
mammalian genomes than in soybean or com (see Figure 6) indicating that mammals contain a greater number of 
(CA)^7 5(TA)^ 5 target loci. This relationship is consistent with the greater estimated fequency of (CA)^ repeats in 
mammalian versus soybean or corn genomes (Wang et ai., Theor. Applied Gen0tics:&B, 1 (1 994); Morgante & Olivieri, 
Plant J {^9Q3)\ Beckmann & Weber Genomics 12.627 (1992)). Second, within a narrow phylogenetic group (species 

15 or genus), the pattern of amplification products between individuals is generally similar, reflecting the general conser- 
vation of restriction sites and SSR loci in the different, yet closely related genomes. Polymorphisms between individuals 
are detected whenever a particular restriction site or SSR locus that contributes to a given amplified fragment in one 
genome carries a base substitution, insertion/deletion, or repeat length difference compared to the other genomes of 
the same species. Virtually no similarities are obvious in the patterns of amplification products between individuals 

20 whose evolutionary distance extends beyond the same genus, consistent with the accepted idea that the more diverged 
the two genomes being compared, the more unlikely they will share common loci. The sets of amplified fragments are 
entirely different, for example, between individual genomes from human compared to mouse or rat, and the amplification 
patterns (and likely the set of amplified products) appear to share no similiarities between soybean and mammals, or 
even between soybean and com. 

25 

EXAMPLE 5 

EFFECTS OF VARYING THE PRIMER CONSTITUTION AND THERMOCYCLING CONDIT IONS TO GENERATE 
DIFFERENT SETS OF AMPLIFICATION PRODUCTS 

30 

Viariability in the SSR-to-adaptor amplification reaction, and therefore in the products obtained, results not only 
from the reaction and thermocycle setup conditions described above, but also from subleties in the design of the primers 
and the thermocycling parameters used in these amplifications. Once a particular compound SSR has been chosen 
as the target locus sequence for this assay, then either partially or entirely different sets of amplification products still 
35 can be generated by altering any one of the following primer design criteria: 1 ) the number and composition of the 3'- 
extension nucleotide(s) on the adaptor-directed primer; 2) the relative lengths of the two constituent simple repeats 
that comprise the compound SSR primer; 3) the particular strand of the double-stranded compound SSR locus chosen 
to correspond to the single-stranded primer (i.e., the directionality of the SSR primer). In addition, the quality of the 
data generated by a particular amplification is affected by the mode by which thermocycling is initiated. 

40 

1 . Design of the adaptor-directed primer: 

This primer, which corresponds to the synthetic adaptor ligated to the restricted ends of the genomic DNA. can 
carry a variable number (zero to ten) and arbitrary sequence of nucleotides at its 3'-end. As described by Zabeau (EP 

45 534,858). these variable 3' -nucleotides on the primer anneal specifically to ^unknown" sequences that are directly 
adjacent to the adaptor and restriction site on a genomic DNA fragment, and the recognition by each of only a subset 
of all possible fragments in the template mixture provides exquisite specificity in the amplification reaction. Since such 
primers that are otherwise identical in sequence except for differences in the few 3' -most nucleotide(s) can amplify 
completely nonoverlapping sets of amplification products, they behave much like allele-specific amplification primers 

50 (Newton et al., (1989) Nua Acids Res 17: 2503; Kwok et al., (1990) Nuc. Acids Res, 18: 999; Wu et al.. (1989) Pmc. 
Natl. Acad. Sci. USA 86: 2757). except that their use requires no prior sequence knowledge of the genomic locus to 
be amplified, and each primer will selectively co-recognize multiple target sites in a template DNA mixture. 

In general, the longer the 3'-extension the more selective the primer; as the variable 3' -extension is made longer, 
the adaptor primer becomes more restricted to recognize a smaller number of potential genomic target sites, leading 

55 to a smaller real number of co-amplified products. The addition of each nondegenerate nucleotide onto the 3*-extension 
leads to approximately 4-fold greater template discrimination. In addition, different single nucleotides at the 3'-most 
base position(s) give unique template specificities to othenwise identical primers. These principles are illustrated by 
the examples shown in Figure 7. The 33P-labe!ed perfect compound SSR primer. (CA)7 5(TA)2.5. was used for SSR- 
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to-adaptor amplification to generate specific products from the genomes of several soybean cultivars, using the ex- 
perimental conditions described in Example 2. This SSR primer was paired with each of several different Taq 1 adaptor- 
directed primers, which all differ only at their 3' -most nucleotide positions (see Table I) : 



.AdF 


.pr5 


.pr6 


.pr7 


.pre 


.prQ 


0 


1 (-G) 


1 (-A) 


1 (-T) 


1 (-C) 


2 (-AC) 



The Taq I adaptor-directed primer with the shortest 3'-extension (zero nucleotides), Taq. AdF, is completely nonselective 
70 and generated the largest number of products; the primer with the longest extension (Taq.pr9) is the most selective 
and resulted in the fewest amplification products. All four of the primers carrying one nucleotide as a 3'-extension 
amplified approximately the same number of products; however, the set of products from each of these four primers 
is unique compared to the sets from the other three. The complex pattern of bands amplified using Taq. AdF is actually 
a composite of all four one-base extension primers. That is, the pattern generated by each of these four primers is 
75 approximately a 1/4 subset of the pattern amplified by the zero-base extended primer. Thus, varying both the number 
and composition of the 3'-selective nucleotides on the adaptor-directed primer is sufficient to generate individual, either 
partially or completely, nonoverlapping sets of amplification products from the same template when paired with a given 
SSR-directed primer. The choice of which 3'-extensions to use will depend largely upon relative nucleotide frequencies 
in a target genome and upon the abundance in the genome of the specific SSR that serves as the other priming site. 

20 

2. Relative lengths of the two constituent simple repeats comprising the compound SSR primer: 

Every simple and nearly every compound SSR locus in the genome is a double stranded structure whose individual 
strands carry different permutations of nucleotides. A single-stranded primer that may specifically anneal to one strand 

2s at a SSR locus will not anneal to the opposite strand (with the exception of the 9 specific palindromic compound SSR 
sequences combinations; see Table II). Therefore, a given SSR-directed primer will primer-extend from each genomic 
target locus in a polar, unidirectional manner, and any compound SSR locus can be recognized and primed from by 
any of four different primer classes. In addition, each of these four canonical primer classes can include a wide range 
of individual primers, all differing by the length of the two constituent repeats within the primer Changes in the lengths 

30 of these constituent repeats have profound effects on primer efficacy and the fidelity of reproducible amplifications; in 
general, the longer the 5'-anchoring repeat relative to the 3' -priming repeat, the better the primer's specificity and 
priming efficiency in the amplification. 

fy/lultiple, individual primers, each differing from the others by the length of its two constituent repeats, have been 
tested for four different compound SSR sequences: (CT)x(AT)y, (AT)x(AG)y. (CA)x(TA)y, (AT)x(GT)y (i.e., the values of 

35 X and y are varied for a particular SSR type). Figure 8 shows the results of a test using three different (CA)x(TA)y 
primers. (CA)4.5(TA)7 5, (CA)e.5(TA)4.5. and (CA)7 5(TA)2,5, and three different (AT)x(GT)y primers, (AT)3.5(GT)6.5. 
(AT)8.5(GT)2.5. and (At)6.5(GT)4 5. All five of these primers were calculated to have a Tm in the range of 38-42'C. Each 
was 5'-labeled with 33p and paired individually with three different Taq l-directed primers (Taq.AdF Taq.prS and Taq. 
pr'8) in amplification reactions using biotin-selected soybean template DNAs, PI81762 and wolverine, as described in 

40 Example 2. Neither of the (AT)x(GT)y primers performed very efficiently, although (AT)3 5(GT)6.5 generated at least 
some products, whereas the other two (AT)x(GT)y primer versions failed completely to generate any amplification 
products. In contrast, alt three (CA)x(TA)y primers were able to generate products, although the number of amplified 
fragments varied among the three primers. These results demonstrate that the longer the 5" -anchoring repeat and the 
shorter the 3'-prlmer extension repeat, the more amplification products are produced. This same conclusion was drawn 

4S from similar experiments performed with (CT)j(( AT)y and ( AT)x(AG)y primers carrying variable constituent repeat lengths 
(not shown). 

3. Polarity of the single stranded compound SSR-directed primer: 

so The choice of which strand of a double-stranded compound SSR locus to use as a primer can be extremely critical 

for determining the success of the SSR-to-adaptor amplification reaction. For some compound SSRs, one strand of 
the double-stranded SSR was found to sen^e as an efficient primer whereas the opposite strand failed completely, 
regardless of the relative lengths of the constituent repeats on the primer This difference was most extreme for com- 
pound SSRs containing a (AT)^ repeat; the only type of (AT)^ containing primer that will lead to efficiently generated 

55 amplification products under standard conditions (described in Example 2) is one in which the (AT)^ sequence is very 
short (1 .5-3 repeat units) and is situated as the 3'-primer extension end. Figure 8, for example, illustrates the superior 
efficiency of (CA)x(TA)y primers in contrast to (AT)x(GT)y primers, which represent the complementary strands of the 
same compound SSR. All three of the (AT)x(GT)y primers tested were extremely inefficient at generating amplification 
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products (two were failures), even though the calculated values of all the primers were approximately the same. It 
is likely that this difference in primer efficiency results from the difference in placement of the (AT)^ stretch within the 
primer. A and T nucleotides display weak hydrogen bonding during base paihng, and oligonucleotides containing (AT)^ 
stretches often have self-complementarity artifacts in competition with weak annealing to the template. Therefore, it is 

5 likely than an (AT)^ stretch at the 5' end of a primer will serve poorly to anchor the primer to the template, whereas a 
short (AT)n at the 3* end will have been well-anchored by the upstream non-(AT)n portion of the primer 

Primers corresponding to compound SSRs that are devoid of (AT)^ repeats are affected much less by the relative 
order of the two constituent repeats. For example, the individual primers in the two complementary primer sets, (TC)4 5 
(AC)4.5 versus (TG)4,s(AG)4.5 (see Figure 3) and (CA)4.5(GA)4.5 vs. (TC)4 5(AC)4.5, (not shown) appear to generate 

10 amplification products with approximately equivalent efficiencies. 

4. Mode of thermocycling initiation 

The reaction setup protcol described in the previous examples is essentially a cold start, and allows the possibility 

IS for primers to anneal under nonstringent conditions both to template sites that are not necessarily a perfect match, 
and to multiple, staggered sites within a target locus (the tatter leads to a stuttering effect of the amplification products 
on the gel). This simple reaction setup protocol, nevertheless, routinely was sufficient to generate amplification products 
that could be distinguished as polymorphic between genomes. In fact, the cold start reaction products from coumpound 
SSR directed primers were consistently quite sharp and distinct; however, comparable products derived from 5'-an- 

20 chored simple SSR primers generally lacked the same clarity and sharpness (compare Figures 2 and 3). Much of this 
indistinctness and individual product heterogeneity, for both types of SSR-directed primer, could be obviated by the 
use of a hot start initiation for the thermocycling. A hot start protocol (Chou et al., Nuc. Acids Res. 20, 1717, 1992) 
prevents spurious primer annealing to incorrect template sites prior to the first denaturation, and generates products 
that resolve more sharply and discretely on the gel. 

2S Othenwise identical amplification reactions were performed using the cold start procedure described in Examples 

1 and 2, and also using two different hot start initiation procedures. For one type of hot start, all the components of 
each amplification reaction were combined as described in MATERIALS AND METHODS, except that the SSR-directed 
primer (both 5' -end labeled and unlabeled versions) was excluded from the full primer cocktail. The reaction tubes 
were capped (18.5 uL reaction volume), and the first denaturation step of the thermocycling was performed (94*^0, 3 

30 min). The reactions were then held at 80**C while 1 .5 uL of the appropriate SSR primer (1 .0 uL of 5 ng/uL 33P-labelled 
plus 0.5 uL of 50 ng/uL unlabelled) was added to each reaction. Exponential amplification was then initiated, using 
either a constant 58*C annealing temperature or a touchdown (56** or 58*C final) thermocycling protocol. For the 
second hot start method, Ampliwax PCR-50 gems were used essentially as described by the manufacturer (Stratagene, 
La Joila, C A), except that a mixture of 1 0X buffer, MgClg, dNTPs. HgO, and primers (5.4 uL total) was first heated with 

35 the Ampliwax and allowed to cool, then a second mixture (1 4.6 uL) consisting of template DNA, AmpliTaq DNA polymer- 
ase, PGR buffer and H2O was added over the top of the re-solidified wax layer (final concentrations of each match 
those in the standard cold start amplifications). Thermocycling was then initiated, using the touchdown annealing pro- 
tocol detailed in Example 2. 

The amplifications illustrated in Figures 2-8 all employed a cold start protocol, A direct comparison of the two 
40 initiation procedures, however, is shown in Figure 9. Two sets of amplifications using the perfect compound SSR primer, 
(CA)7 5(TA)2 5. paired with three different Taq I adaptor primers (Taq. AdF, Taq.pr6, Taq.prS) were performed using the 
conditions described in Example 2 (56*'G final touchdown temperature); one set was initiated with a standard cold start, 
the other with the first hot start protocol described above. These results demonstrate that a hot start is superior for 
generating the most discrete bands with the least amount of stutter, although the cold start is adequate for producing 
45 products that nonetheless are discemable as polymorphic between genotypes. 

In general, a hot start produced the sharpest product bands on the gel for nearly every SSR-directed primer tested. 
The most extreme difference between these protocols were obsen/ed when 5'-anchored simple SSR primers were 
used. In fact, a cold start using 5'-anchored simple SSR primers often led to unacceptably smeared and fuzzy product 
bands. The differences were generally more subtle for compound SSR primers, although some SSR primers (those 
so with long stretches of (AT)^ as the 5' -anchor) failed to produce any product under hot start conditions. 

EXAMPLE 6 

AMPLIFICATION USING PERFECT COMPOUND SSR-DIRECTED PRIMERS FOR INTER-SSR AMPLIFICATIONS 
55 TO DETECT POLYMORPHISMS 

Example 6 demonstrates the use of perfect, in-phase compound SSR primers in a single-primer amplification for 
the production and detection of genetic polymorphisms. Simple SSR sequences containing 3 degenerate bases at the 
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extreme 5'-end have been shown previously to serve as efficient primers for amplification between neighboring SSR 
sequences in the genome (Zietkiewicz, et al., Genomics, 20, 176, (1994)). Similarly, primers corresponding to com- 
pound SSR sequences also are useful and extremely efficient for generating inter-SSR amplification products in a 
single-primer PGR reaction. Generally, the same compound SSR primers that are most efficient in SSR-to-adaptor 

5 amplifications, those with in-phase sequences and corresponding to the most abundant of the compound SSRs in the 
genome (see Table II) also are the most efficient for generating inter-SSR amplification products. 

Figure 10 illustrates the single-primer amplification products obtained using both 5'-anchored simple SSR primers 
and compound SSR primers, from three different cold start thermocycling protocols: SB'C constant, 58'C touchdown, 
and SS'^C touchdown (described in Examples 1 and 2). SSR primers were 5'-end labeled with [Y-33P]ATP as described 

10 in MATERIALS AND METHODS. Either 20 ng undigested genomic DN A or the standard amount of digested, biotin- 
selected adaptor modified template DN A were combined with the compound SSR primer (5 ng labeled primer combined 
with 25 ng unlabeled primer) and the other non-primer components of the amplification reactions described in the 
previous examples. No adaptor-directed primers were added. These reactions were performed using both hot start 
(not shown) and cold start conditions although the products resulting from hot start were more discrete and better 

IS resolved on the gels. 

Figure 10, panel a shows a comparison of the products from, undigested genomic DNA for soybean (Bonus and 
PI 81762) and corn (B73 and CM37) cultivars using the 5'-anchored simple SSR primer, DBD(AC)7 5. generated with 
the three thermocycling profiles indicated. Panel b illustrates a comparison of the amplification products obtained using 
the 5'-anchored simple SSR primers. DBD(AC)7 5 and HBH(AG)8.5, and perfect compound SSR primers, (AT)3 5{AG)7 5 

so and (AT)3 5(GT)6.5. from both undigested and Taq I + Pst I digested, biotin-selected template DNAs from soybean 
wolverine and PI 81762. Two different thermocycling methods were used, as indicated. 

The undigested DNA templates produced a greater number of amplification products than did the digested tem- 
plates. The reactions using digested, adaptor-li gated templates served as single-primer controls for the SSR-to-adaptor 
amplifications described in Examples 1 and 2; relatively few fragments were amplified by the single SSR primer from 

25 these cut DNA templates, indicating that most of the amplification products obsen^ed in the SSR-to-adaptor reactions 
are dependent upon the presence of both the SSR sequence and a neighboring adaptor sequence on each digested 
DNA fragment. The few single-primer products that are visible may result either from bona fide inter-SSR amplification 
within single DNA fragments, or perhaps from some sort of inter-fragment pairing. 

These results demonstrate that compound, as well as simple, SSR primers can generate inter-SSR amplification 

30 products from the corn and soybean genomes. The multiple products generated from an individual SSR primer are a 
mixture of nonpolymorphic and polymorphic fragments. The polymorphic bands between genotypes indicate length 
differences between or within neighboring SSR sequences in the genome; these fragments are potential markers for 
genome identification, fingerprint analysis, or marker assisted selection. 

3S EXAMPLE 7 

CONVERSION OF SSR-TO-ADAPTOR BAND POLYMORPHISMS TO SINGLE-LOCUS SSR MARKERS 

Once a good SSR is found by using the SSR-to-adaptor amplification (or SAMPL) method, it often may be desirable 

40 to focus only on this single SSR, or just a few SSRs. for subsequent analysis of a species, and to examine variation 
at these few SSRs using a more a straightforward, nonradioactive, single-locus method. 

To accomplish conversion of a band from a SAMPL gel to a single locus marker, this band is first excised from the 
dried gel (see Figure 11). The DNA from the band is eluted from the gel by heating in lOOuL H2O at 95"C, 15 min. The 
debris is pelleted by centrifugation for 2 min at 12000 rpm, and the DNA in the supemate is precipitated by adding 0.1 

45 volume 3M sodium acetate, pH 5.3, 0.025 volumes 20mg/mL glycogen and 2.5 volumes ethanol, incubating at -70'*C 
30 min. then centrifuging at 12000rpm. 10 min. The pelleted DNA is washed once in 70% ethanol. air-dried, then 
resuspended in lOuL H2O. One uL of this DNA then is used as template for PCR amplification using conditions as 
described in Example 2, except that the SSR-directed primer and the adaptor-directed primer (each at 1 .5ng/uL final 
concentration; corresponding exactly to the primer pair used for the original amplification) each are unlabeled (see 

50 Figure 11). These re-amplification products are purified using a Qiagen (Chatsworth. CA) PCR fragment cleanup kit, 
and the purified DNA fragments either are subcloned into a suitable T-vector (for example, pGEM-T, Promega. Madison. 
Wl) and the insert sequenced using vector-directed primers, or are sequenced directly without subcloning, using the 
adaptor-directed primer as the sequencing primer. In either case, the DNA sequence for the amplified fragment is 
obtained, allowing design of the first locus-specific flanking primer (lsfp-1). This primer corresponds to the unique 

55 sequence flanking the SSR on the amplified fragment, and is oriented with its 3*-end toward the SSR (Figure 11). 

The lsfp-1 primer then is paired with a primer corresponding to the second adaptor used for the initial preparation 
of the restriction fragment templates, for amplification across the targeted SSR. This amplification can be performed 
with or without radiolabeling the lsfp-1 primer, although the nonspecific background is reduced with a radiolabled Isfp- 
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1 primer. The specifically amplified adaptor-to-lsfp-1 band is excised from the gel and sequenced as described above. 
From the resulting DNA sequence of the other flanking region of the SSR, a second locus-specific flanking primer, Istp- 
2, then is designed. The 3'-end of this primer is oriented toward the SSR, oppositely to lsfp-1 . 

Finally, the lsfp-1 and lsfp-2 unique primers are paired and used for PGR amplification either using restriction 
5 fragmented DNA template mixtures or using unrestricted genomic DNA templates. Use of this primer pair generates 
a locus-specific marker that spans the targeted SSR. However, repeat length variation at this SSR now can be detected 
quickly and nonradioactivety from any undigested genome using these specific flanking region primers. 

SEQUENCE LISTING 

10 

(1) GENERAL INFORMATION: 

(i) APPLICANT MORGANTE, MICHELE 
VOGEL, JULIE M. 

IS 

(ii) TITLE OF INVENTION: COMPOUND MICROSATELLITE 
PRIMERS FOR THE 

DETECTION OF GENETIC 
POLYMORPHISMS 

20 

(iii) NUMBER OF SEQUENCES: 89 

(iv) CORRESPONDENCE ADDRESS: 

25 (A) ADDRESSEE: E. I. DU PONT DE NEMOURS AND 

COMPANY 

(B) STREET 1007 MARKET STREET 

(C) CITY: WILMINGTON 

(D) STATE: DELAWARE 
30 (E) COUNTRY: U.S.A. 

(F)ZIP: 19898 

(V) COMPUTER READABLE FORM: 

35 (A) MEDIUM TYPE: FLOPPY DISK 

(B) COMPUTER: IBM PC COMPATIBLE 

(C) OPERATING SYSTEM: PC-DOS/MS-DOS 

(D) SOFTWARE: PATENT IN RELEASE #1 .0, 
VERSION 1.25 

40 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 

45 (C) CLASSIFICATION: 

(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: 08/346,456 
50 (B) FILING DATE: 28 NOVEMBER 1 994 

(viii) ATTORNEY/AGENT INFORMATION: 

(A) NAME: FLOYD, LINDA AXAMETHY 
55 (B) REGISTRATION NUMBER: 33.692 

(C) REFERENCE/DOCKET NUMBER: BB-1064-A 

(ix) TELECOMMUNICATION INFORMATION: 
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(A) TELEPHONE: 302-892-8112 

(B) TELEFAX: 302-992-7949 

(2) INFORMATION FOR SEQ ID N0:1: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 40 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:1: 

CACACACACA CACACACACA CATATATATA TATATATATA 

(2) INFORMATION FOR SEQ ID N0:2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 40 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:2: 

TATATATATA TATATATATG TGTGTGTGTG TGTGTGTGTG 

(2) INFORMATION FOR SEQ ID NO:3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 17 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:3: 

AGAGAGAGAG AGAGAGA 

(2) INFORMATION FOR SEQ ID NO:4: 
(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 17 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

5 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:4: 

10 

GAGAGAGAGA GAGAGAG 



(2) INFORMATION FOR SEQ ID NO:5: 

IS 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 17 base pairs 

(B) TYPE: nucleic acid 

20 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genonnic) 

25 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:5: 



TCTCTCTCTC TCTCTCT 
30 (2) INFORMATION FOR SEQ ID NO:6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 17 base pairs 
35 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genonnic) 

40 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:6: 



CTCTCTCTCT CTCTCTC 

45 

(2) INFORMATION FOR SEQ ID NO:7: 

(i) SEQUENCE CHARACTERISTICS: 

so 

(A) LENGTH: 15 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

55 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:7: 
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ACACACACAC ACACA 

(2) INFORMATION FOR SEQ ID N0:8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:8: 

CACACACACA CACAC 

(2) INFORMATION FOR SEQ ID NO:9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:9: 

TGTGTGTGTG TGTGT 

(2) INFORMATION FOR SEQ ID NO:10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:10: 

GTGTGTGTGT GTGTG 

(2) INFORMATION FOR SEQ ID NO: 11: 
(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 14 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:11 : 



CCGGTTTTTT TTTT 



(2) INFORMATION FOR SEQ ID NO:12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:12: 



GCGCAAAAAA AAAA 



(2) INFORMATION FOR SEQ ID NO:13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:13: 



ACACACACAC ACA 



(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY linear 

(ii) MOLECULE TYPE: DNA (genomic) 



53 



EP 0 804 618 B1 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:14: 

AGAGAGAGAG AGA 

(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:15: 

TGTGTGTGTG TGT 

(2) INFORMATION FOR SEQ ID NO: 16: 
(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 17 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(it) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:16: 

CGGCACACAC ACACACA 

(2) INFORMATION FOR SEQ ID N0:17: 
(I) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:17: 

CTCGTAGACT GCGTACATGC A 
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(2) INFORMATION FOR SEQ ID N0:18: 
(I) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NC:18: 

TGTACGCAGT CTAC 

(2) INFORMATION FOR SEQ ID N0:19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 17 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:19: 

CTCGTAGACT GCGTACC 

(2) INFORMATION FOR SEQ ID NO:20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:20: 

AGCTGGTACG CAGTC 

(2) INFORMATION FOR SEQ ID NO;21: 
(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 16 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NG:21: 

GACGATGAGT CCTGAC 

(2) INFORMATION FOR SEQ ID NO:22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:22: 

CGGTCAGGAC TCAT 

(2) INFORMATION FOR SEQ ID NO:23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 17 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:23: 

GGAATTCTGG ACTCAGT 

(2) INFORMATION FOR SEQ ID NO:24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:24: 
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GATCACTGAG TCCAGAATTC C 

(2) INFORMATION FOR SEQ ID NO:25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 16 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:25: 

TGGCCTTTAC AGCGTC 

(2) INFORMATION FOR SEQ ID NO:26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:26: 

TACACGCTGT AAAG 

(2) INFORMATION FOR SEQ ID NO:27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 17 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:27: 

CTCGTAGACT GCGTACC 
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(2) INFORMATION FOR SEQ ID NO:28: 
(i) SEQUENCE CHARACTERISTICS: 

5 (A) LENGTH: 17 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

10 (ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:28: 



IS CTGCGTACCA GCTTACA 17 

(2) INFORMATION FOR SEQ ID NO:29: 
20 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 17 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
2S (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:29: 

30 

CTGCGTACCA GCTTACC 17 

35 (2) INFORMATION FOR SEQ ID NO:30: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 17 base pairs 
40 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



45 



50 



(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:30: 

CTGCGTACCA GCTTAAC 17 



(2) INFORMATION FOR SEQ ID NO:31: 
(i) SEQUENCE CHARACTERISTICS: 

55 

(A) LENGTH: 17 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
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(D) TOPOLOGY; linear 
(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:31: 

CTGCGTACCA GCTTGTC 

(2) INFORMATION FOR SEC ID NO:32: 
(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 16 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:32: 

CTGCGTACCA GCTTAC 

(2) INFORMATION FOR SEQ ID NO:33: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 16 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION; SEQ ID NO:33: 

CTGCGTACCA GCTTAA 

(2) INFORMATION FOR SEQ ID NO:34; 

(i) SEQUENCE CHARACTERISTICS; 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION; SEQ ID NO:34: 
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CTCGTAGACT GCGTACATGC A 

(2) INFORMATION FOR SEQ ID NO: 35: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:35: 

GACTGCGTAC ATGCAGAC 

(2) INFORMATION FOR SEQ ID NO:36: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:36: 

GACTGCGTAC ATGCAGAA 

(2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:37: 

GACTGCGTAC ATGCAGCA 

(2) INFORMATION FOR SEQ ID NO:38: 
(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:38: 



GACTGCGTAC ATGCAGTT 



(2) INFORMATION FOR SEQ ID NO:39: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 17 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:39: 



GACTGCGTAC ATGCAGA 



(2) INFORMATION FOR SEQ ID NO: 40: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 17 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:40: 



GACTGCGTAC ATGCAGC 



(2) INFORMATION FOR SEQ ID NO:41: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 16 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:41: 

^ GACGATGAGT CCTGAC 15 

(2) INFORMATION FOR SEQ ID NO: 42: 
10 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
15 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:42: 



20 



ATGAGTCCTG ACCGA 



25 (2) INFORMATION FOR SEQ ID NO:43: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 17 base pairs 
30 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



35 



40 



(2) INFORMATION FOR SEQ ID NO:44: 
45 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 17 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
50 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genonnic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:44: 

55 



15 



(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:43: 

TGAGTCCTGA CCGAACC 17 
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TGAGTCCTGA CCGAACA 

(2) INFORMATION FOR SEQ ID NO:45: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 17 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xl) SEQUENCE DESCRIPTION: SEQ ID NO:45: 

TGAGTCCTGA CCGACAC 

(2) INFORMATION FOR SEQ ID NO:46: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 17 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:46: 

TGAGTCCTGA CCGACAA 

(2) INFORMATION FOR SEQ ID NO:47: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 16 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:47: 

ATGAGTCCTG ACCGAG 
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(2) INFORMATION FOR SEC ID NO: 48: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 16 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:48: 

ATGAGTCCTG ACCGAA 

(2) INFORMATION FOR SEQ ID NO:49: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 16 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:49: 

ATGAGTCCTG ACCGAT 

(2) INFORMATION FOR SEQ ID NO:50: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 16 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:50: 

ATGAGTCCTG ACCGAC 

(2) INFORMATION FOR SEQ ID NO:51 : 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 16 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
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(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: DNA (genomic) 
5 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:51 : 



TGAGTCCTGA CCGAAC 26 

10 

(2) INFORMATION FOR SEQ ID NO:52: 

(i) SEQUENCE CHARACTERISTICS: 

75 

(A) LENGTH: 16 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

20 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:52: 

25 

TGAGTCCTGA CCGAAA 16 



(2) INFORMATION FOR SEQ ID NO:53: 

30 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 16 base pairs 

(B) TYPE: nucleic acid 

35 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

40 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:53: 



TGAGTCCTGA CCGACA 16 

45 

(2) INFORMATION FOR SEQ ID NO:54: 
(i) SEQUENCE CHARACTERISTICS: 

so (A) LENGTH: 1 7 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

ss (ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:54: 
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GGAATTCTGG ACTCAGT 17 

5 (2) INFORMATION FOR SEC ID NO:55: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 
10 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



15 



20 



so 



ss 



(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:55: 

GGAATTCTGG ACTCAGTGAT C 21 



(2) INFORMATION FOR SEQ ID NO:56: 
26 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
30 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:56: 

35 

TTCTGGACTC AGTGATCT . 18 

40 (2) INFORMATION FOR SEQ ID NO:57: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 
45 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:57: 

TCTGGACTCA GTGATCTT 18 
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(2) INFORMATION FOR SEQ ID NO;58: 
(I) SEQUENCE CHARACTERISTICS; 

5 (A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

TO (ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:58: 



IS CTGGACTCAG TGATCTTC 18 



(2) INFORMATION FOR SEQ ID NO:59: 
20 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 16 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
25 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:59: 

30 

TGGCCTTTAC AGCGTC 



35 (2) INFORMATION FOR SEQ ID NO:60: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 
40 (B) TYPE; nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE; DNA (genomic) 

45 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:60: 



GCCTTTACAG CGTCTAAT 

so 

(2) INFORMATION FOR SEQ ID NO:61; 
(!) SEQUENCE CHARACTERISTICS: 

55 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
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(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:61: 

CCTTTACAGC GTCTAATC 

(2) INFORMATION FOR SEQ ID NO:62: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:62: 

CCTTTACAGC GTCTAATCA 

(2) INFORMATION FOR SEQ ID NO:63: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:63: 

TATATATAGA GAGAGAGAGA GA 

(2) INFORMATION FOR SEQ ID NO:64: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:64: 
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ATATATATAT ATATAGAGAG AGAG 

(2) INFORMATION FOR SEQ ID NO:65: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:65: 

CTCTCTCTCT ATATATATAT ATAT 

(2) INFORMATION FOR SEQ ID NO:66: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO;66: 

TCTCTCTCTC TCTCTATATA TA 

(2) INFORMATION FOR SEQ ID NO:67: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:67: 

CTCTCTCTCT CTCTCTATA 

(2) INFORMATION FOR SEQ ID NO:6e: 
(i) SEQUENCE CHARACTERISTICS: 



69 
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(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:68: 

TATATATGTG TGTGTGTGTG 20 



15 (2) INFORMATION FOR SEQ ID NO:69: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 
20 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



25 



30 



35 



40 



45 



50 



(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:69: 

TATATATATA TATGTGTGTG TG 22 

(2) INFORMATION FOR SEQ ID NO:70: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:70: 

TATATATATA TATATATGTG TGTG 24 

(2) INFORMATION FOR SEQ ID N0:71 : 
(i) SEQUENCE CHARACTERISTICS: 



(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

55 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



70 



EP 0 804 618 B1 

(xi) SEQUENCE DESCRIPTION: SEQ !D NO:71; 

ACACACACAT ATATATATAT ATAT 

(2) INFORMATION FOR SEQ ID NO:72: 
(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:72: 

ACACACACAC ACATATATAT AT 

(2) INFORMATION FOR SEQ ID NO:73: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY linear 

(ii) MOLECULE TYPE: DNA (genonnic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:73: 

ACACACACAC ACACATATAT 

(2) INFORMATION FOR SEQ ID NO:74: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:74: 

TGTGTGTGTG TGTGTATAT 
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(2) INFORMATION FOR SEQ ID NO:75: 
(I) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:75: 

GAGAGAGAGA GAGAGATAT 

(2) INFORMATION FOR SEQ ID NO:76: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:76: 

CTCTCTCACA CACACACA 

(2) INFORMATION FOR SEQ ID NO:77: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:77: 

CTCTCTCTCA CACACACA 

(2) INFORMATION FOR SEQ ID NO:78: 
(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 18 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xl) SEQUENCE DESCRIPTION: SEQ ID NO:78: 



GTGTGTGTGA GAGAGAGA 



(2) INFORMATION FOR SEQ ID NO:79: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:79: 



ACACACACAG AGAGAGAG 



(2) INFORMATION FOR SEQ ID NO:80: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:80: 



CTCTCTCTCT GTGTGTGT 



(2) INFORMATION FOR SEQ ID NO:81 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:81: 

AGAGAGAGTG TGTGTGTG 

(2) INFORMATION FOR SEQ ID NO:82: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:82: 

TATATATGTG TGTGTGTGTG 

(2) INFORMATION FOR SEQ ID NO:83: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:83: 

TATATATATA TATGTGTGTG TG 

(2) INFORMATION FOR SEQ ID NO:84: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:84: 

TATATATATA TATATATGTG TG 
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(2) INFORMATION FOR SEQ ID NO:85: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 43 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genonriic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:85: 

TATATATATA TATATATATA TATGTGTGTG TGTGTGTGTG TGT 

(2) INFORMATION FOR SEQ ID NO:86: 
(I) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 43 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:86: 

TATATATATA TATATATATA TATCACACAC ACACACACAC ACA 

(2) INFORMATION FOR SEQ ID NO:B7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:87: 

TATATATATA TATATACACA CACA 

(2) INFORMATION FOR SEQ ID NO:88: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 
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(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:88: 

TATATATATA CACACACACA CA 22 



(2) INFORMATION FOR SEQ ID NO:89: 
15 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
20 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:89: 

25 

TATATACACA CACACACACA 20 

30 

Claims 

1. An improved method of detecting polymorphisms between two individual nucleic acid samples comprising ampli- 
35 tying segments of nucleic acid from each sample using primer-directed amplification and comparing said amplified 

segments to detect differences, the improvement comprising wherein at least one of the primers used in said 
amplification consists of a perfect compound simple sequence repeat in which two different repeating sequences 
are either directly adjacent or are separated by no more than three intervening bases. 

40 2. The process of Claim 1 wherein said perfect compound simple sequence repeat primer is described by formula I 



5- (XY),(NM)„ 3* 

wherein: 

n is independently 2-1 5; 
Xis A, C, TorG; 
Y is A, C, T or G; 
50 NisA. C, TorG; 

M is A, C, TorG; 

and provided that: 

55 X ^ Y; 

N 9± M; and 
XY^tNM. 
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3. The process of Claim 1 wherein n is independently 4 to B. 

4. The process of Claim 1 wherein said perfect compound simple sequence repeat primer is described by formula II 

5- {XYZ)^(LMP)„ 3' II 

wherein: 

m is independently 2-10; 
X is A. C, T or G; 
Yis A, CTorG; 
Z is A, C, T or G; 
M is A, C. T or G; 
N is A, CTorG; 
P is A, C, T or g; 

and provided that: X, Y and Z are not all the same; 
L, M and P are not all the same; 

and; 

XYZ NMP 

5. The process of Claim 4 wherein m is independently 2 to 4. 

6. The process of Claim 1 wherein said perfect compound simple sequence repeat primer is in-phase, described by 
Formula III or IV 

5- (XY)„(XZ)„ 3- Ml 



5' (YX)„(ZX)„ 3' IV 

wherein: 

n is independently 2-15; 
X is A, C, T or G; 
Yis A, CTorG; 
Z is A. C T or G; 

and provided that: 

Y;<iX; 
Z ^ X; and 
Y^tZ. 

7. The process of Claim 6 wherein n is independently 4 to 8. 

8. The process of Claim 6 wherein the value of n for the 5' repeating dinucleotide is greater than the value of n for 
the 3' repeating dinucleotide. 

9. The process of Claim 6 wherein said in-phase perfect compound simple sequence repeat primer is selected from 
the group consisting of: 

5* (AC)„(AT)„ 3' 
(CA)„(TA), 
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(AT)n(GT), 
(TA)n(TG)„ 
(TA)„(CA)„ 
(AT)n(AC)„ 

5 (TG)„(TA)„ 
(GT)„(AT), 
(TA)n(GA)„ 
(AT)n(AG)„ 
(TC)n(TA)„ 

^0 (CT)„(AT)„ 
(AC)„(AG), 
(CA)„(GA)„ 
(CT)„(GT), 
(TC)„(TG)„ 

^5 (TG)„(AG), 
(GT)JGA)„ 
(CT)„{CA)„ 
(TC)n(AC)„ 
(AG)„(TG)n 
20 (GA)„(GT)„ 
(CA),(CT)„ 
5' (AC)n(TC), 3' 
wherein n is independently 2 to 1 5. 

25 10. The process of Claim 1 wherein said primer-directed amplification is performed using a single primer consisting 
of a perfect compound simple sequence repeat. 



11, The process of Claim 1 wherein said perfect compound simple sequence repeat is in-phase. 

30 12. A process for detecting polymorphisms between two samples of nucleic acid comprising separately treating each 
nucleic acid sample according to the steps of a-d: 

a) digesting the nucleic acid with at least one restriction enzyme whereby restriction fragments are generated; 

b) ligating adaptor segments to the ends of the restriction fragments of step a); 

35 c) amplifying the fragments of step b) using primer-directed amplification wherein the amplification primers 

comprise a first primer consisting of a perfect compound simple sequence repeat as defined in claim 1, and 
a second primer comprising a sequence which is complementary to an adaptor segment of step b); and 
d) comparing the amplified nucleic acid products of step c) from each nucleic acid sample to detect differences. 



40 1 3. The process of Claim 1 2 in step c) wherein said first primer consists of a perfect compound simple sequence repeat 
which is In-phase. 



1 4. The process of Claim 1 2 in step c) wherein said second primer f u rther comprises at the 3' end from 1 to 1 0 arbitrary 
nucleotides. 

45 

15. The process of Claim 12 at step a) wherein two different restriction enzymes are used to digest said nucleic acid, 
one restriction enzyme recognizing a tetranucleotide site on the sample nucleic acid and the other restriction en- 
zyme recognizing a hexanucleotide site on the sample nucleic acid; and further wherein at step b) two different 
adaptor segments are ligated to the restriction fragments generated at step a). 

so 

16. The process of Claim 15 wherein at step b) one of the two adaptor segments carries a member of a binding pair 



17. The process of Claim 16 wherein said member of a binding pair is biotin. 



55 18. The process of Claim 16 further comprising an additional step performed after step b) : 



b) (I) separating those fragments of step b) which carry a member of a binding pair from those fragments of 
step b) which do not carry a member of a binding pair; and further at step c) wherein only those fragments at 
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step b) (i) which carry a member of a binding pair are amplified according to step c). 

19. The process of Claim 12 at step c) wherein said first primer carries a reporter molecule. 

20. The process of Claim 19 wherein said reporter is 32p or 93p. 

21. The process of Claim 12 at step c) wherein said amplification is perfomied using a touchdown thermocycling 
protocol. 

22. The process of Claim 12 at step c) wherein said amplification is initiated using a hot start protocol. 

23. The process of Claim 13 wherein said in-phase perfect compound simple sequence repeat is selected from the 
group consisting of: 

5' (AC)„(AT)„ 3- 
(CA)„(TA)„ 
(AT)„(GT)„ 
(TA)„(TG)„ 
(TA)„(CA)„ 
{AT)„(AC)„ 
(TG)„(TA)n 
(GT)„(AT)„ 
(TA)„(GA)„ 
(AT)„(AG)„ 
(TC)„(TA)„ 
(CT)„(AT)„ 
(AC)„(AG)„ 
(CA)„(GA)„ 
(CT)„(GT)„ 
(TC)„(TG)„ 
(TG)„(AG)„ 
(GT)„(GA)„ 
(CT)„(CA)„ 
(CA)„(CT)„ 
(AG)„(TG)n 
(GA)„(CT)„ 
(CA)„(CT)„ 
5MAC)„(TC)„3' 
wherein n is independently 2 to 15. 

24. The process of Claim 23 wherein the value of n for the 5' repeating dinucleotide Is greater than the value of n for 
the 3' repeating dinucleotide. 

25. A process for detecting polymorphisms between two samples of nucleic acid comprising separately treating each 
nucleic acid sample according to the steps of a-d: 

a) digesting the nucleic acid with at least one restriction enzyme whereby restriction fragments are generated; 

b) llgating adaptor segments to the ends of the restriction fragments of step a); 

c) amplifying the fragments of step b) using primer-directed amplification wherein the amplification primers 
comprise a first primer consisting of a simple sequence repeating region at the 3' end and a degenerate nu- 
cleotide region at the 5' end; and a second primer comprising a sequence which is complementary to an 
adaptor segment of step b); and 

d) comparing the amplified nucleic acid products of step c) from each nucleic acid sample to detect differences. 

26. The process of Claim 25 at step c) wherein said first primer is described by Formula V, 



5' (degenerate nucleotide)^(XY)„ 3' 
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10 



IS 



2S 



35 



40 



wherein: 

Xis A, C.TorG; 
Y is A, C, T or G; 
X^Y; 

r is 2 to 6; and 
n is 2 to 15. 

27. The process of Claim 25 at step c) wherein said first primer Is described by Formula VI; 

5' (degenerate nucleotide)^(XYZ)^ 3' VI 

wherein: 



X is A. C. T or G; 
Yis A, C.TorG; 
Zis A, C, TorG; 
X, Y, and Z are not all the same; 
20 r is 2 to 6; and 

mis 2 to 10. 



Patentanspruche 



1 . Verbessertes Verfahren zum Nachweisen von Polymorphismen zwischen zwei individuellen Nucleinsaureproben. 
umtassend Amplifizieren von Nucleinsauresegmenten aus jeder Probe unter Verwenden von Primer gerichteter 
Amplifikation und Vergleichen der amplifizierten Segments unter Nachweisen von Unterschieden, wobei die Ver- 
besserung umfaBt, wobei mindestens einer der bei der Amplifikation verwendeten Primer aus einer Perfekt-Ver- 

^0 bindungs-Einfach-Sequenz-Wiederholung besteht, in der zwei unterschledliche WIederholungs-Sequenzen durch 

nicht mehr als drei dazwischenliegende Basen getrennt sind. 

2. Verfahren nach Anspruch 1 , wobei der Perfekt-Verbindungs-Elnfach-Sequenz-Wiederholungs-Primer durch die 
Formel 1 beschrieben wird, 



5* (XY)„(NM)„ 3" 



wobei 



n unabhangig 2-15 ist; 
X A, G, T Oder G ist; 
Y A. C, T Oder G ist; 
N A, G. T Oder G ist; 
M A, G, T Oder G ist; 

und unter der Voraussetzung daB: 

X^eY; 

N ^ M; und 

XY 9t Nf^. 

3. Verfahren nach Anspruch 1 , wobei n unabhangig 4 bis 8 ist. 

4. Verfahren nach Anspruch 1 , wobei der Perfekt-Verbindungs-Elnfach-Sequenz-Wiedertiolungs-Primer durch die 
Formel 11 
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5' (XYZ)^(LMP)„ 3' II 

beschrieben wird, 
s wobei 

m unabhangig 2-10 ist: 
X A, C, T Oder G ist; 
YA.C.T Oder Gist; 
10 Z A, C, T Oder G ist; 

M A. C. T Oder G ist; 
N A, C, ToderGist; 
P A. C, T Oder G ist; 

1^ und unter der Voraussetzung, daB 

X, Y und Z nicht alls gleich sind; 
L, M und P nicht alls gleich sind; und 
XYZ ^ NMR 

20 

5. Verfahren nach Anspruch 4, wobei m unabhangig 2 bis 4 ist. 

6. Verfahren nach Anspruch 1, wobei der Perfekt-Verbindungs-Eintach-Sequenz-Wiederholungs-Primer In-Phase 
ist, beschrieben durch die Fonmel III oder IV 

25 

5' (XY)„(XZ)„ 3' 111 



30 



5" (YX)„(ZX)„ 3" IV 



35 



40 



wobei 

n unabhangig 2-15 ist; 
X A, C. T Oder G ist; 
Y A, C. T Oder G ist; 
Z A, C. T Oder G ist; 

und unter der Voraussetzung, daB 

Y^tX; 
Z ;^ X; und 
Y^Z. 

7, Verfahren nach Anspruch 6, wobei n unabhangig 4 bis 8 ist. 

8. Verfahren nach Anspruch 6, wobei der Wert von n fur das 5' Wiederholungsdinucleotid groBer als der Wert von n 
fur das 3' Wiederholungsdinucleotid ist. 

^ 9. Verfahren nach Anspruch 6, wobei der In-Phasen-Perfekt-Verbindungs-Einfach-Sequenz-Wiederholung-Primer 
ausgewahtt ist aus der Gruppe aus: 
5* (AC)„(AT)„ 3' 
(CA)„(TA)n 
(AT)n(GT)„ 
(TA)„(TG)„ 
(TA)n(CA)„ 
(AT)n(AC)„ 
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(TG)„(TA)„ 
(GT)„(AT)„ 
(TA)„(GA)„ 
(AT)„(AG)„ 
5 (TC)„(TA)„ 
(CT)„(AT)„ 
(AC)„(AG)n 
(CA)„(GA)„ 
(CT)„(GT)„ 

^0 (TC)„(TG)„ 
(TG)„(AG)„ 
(GT)n(GA)„ 
(CT)„(CA)„ 
(TC)„(AC)„ 

^5 (AG)„(TG)„ 
(GA)n(GT)„ 
(CA)„(CT)„ 
5' (AC)„(TC)„ 3' 
wobei n unabhangig 2 bis 15 ist. 

20 

10. Verfahren nach Anspruch 1, wobei die Primer gerichtete Amplifikation unter Verwenden eines Einzelprimers be- 
stehend aus einer Perfekt-Verbindungs-Einfach-Sequenz-Wiederholung durchgetuhrt wird. 

11. Verfahren nach Anspruch 1. wobei die Perfekt-Verbindungs-Einfach-Sequenz-Wiederholung In-Phase ist. 

25 

12. Verfahren zum Nachweisen von Polymorph ismen zwischen zwei Proben von Nucleinsaure, umfassend separates 
Behandein jeder Nucleinsaureprobe gemaB den Stufen a-d: 

a) Verdauen der Nucleinsaure mit mindestens einem Restriktionsenzym, wodurch Restriktionsfragmente er- 
30 zeugt werden; 

b) Ligasieren von Adaptorsegmenten an die Enden der Restriktionsfragmente der Stute a); 

c) Amplifizleren der Fragmente der Stufe b) unter Venwenden von Primer gerichteter Amplifikation. wobei die 
Amplifikationsprimer einen ersten Primer umfassend bestehend aus einer Perfekt-Verbindungs-Einfach-Se- 
quenz-Wiederholung, nach Anspruch 1 , und einen zwerten Primer, umfassend eine Sequenz, welche kom- 

35 plementar zu einem Adaptorsegment der Stufe b) ist; und 

d) Vergleichen der amplif izierten Nucleinsaureprodukte der Stufe c) von jeder Nucleinsaurenprobe unter Nach- 
weisen von Unterschieden. 

13. Verfahren nach Anspruch 1 2 in Stufe c). wobei der erste Primer aus einer Perfekt-Verbindungs-Einfach-Sequenz- 
40 Wiederholung besteht, die In-Phase ist. 

14. Verfahren nach Anspruch 12 in Stufe c), wobei der zweite Primer ferner an dem 3' Ende 1 bis 10 willkurliche 
Nucleotide umfafBt. 

45 15. Verfahren nach Anspruch 12 bei Stufe a), wobei zwei unterschiedliche Restriktionsenzyme venwendet werden um 
die Nucleinsaure zu verdauen, ein Restriktionsenzym eineTetranucleotidstelleauf derProbennucleinsaureerkennt 
und das andere Restriktionsenzym eine Hexanucleotidstelle auf der Probennucleinsaure erkennt; und wobei ferner 
bei Stufe b) zwei unterschiedliche Adaptorsegmente an die bei Stute a) erzeugten Restriktionsfragmente ligasiert 
werden. 

50 

16. Verfahren nach Anspruch 15, wobei bei Stufe b) eines der zwei Adaptorsegmente ein Glied eines Bindungspaares 
tragt. 

17. Verfahren nach Anspruch 16. wobei das Glied eines Bindungspaares Biotin ist. 

55 

18. Verfahren nach Anspruch 16, welches ferner eine nach Stufe b) durchgefuhrte zusatzliche Stufe umfaBt: 

b)(i) Trennen jener Fragmente der Stufe b). welche ein Glied eines Bindungspaares tragen, von denjenigen 
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Fragmenten der Stufe b). welche nicht ein Glied eines Bindungspaares tragen; und ferner bei Stufe c). wo nur 
diejenigen Fragmente bei Stufe b){i) welche ein Glied eines Bindungspaares tragen, gemaR Stufe c) amplift- 
ziert werden. 

19. Verfahren nach Anspruch 12 bei Stufe c), wobet der erste Primer ein Reportermolekul tragt. 

20. Verfahren nach Anspruch 19. wobei der Reporter ^9 oder ist. 

21. Verfahren nach Anspruch 12 bei Stufe c), wobei die Amplifikation unter Verwenden eines Touchdown Thermocyc- 
ling Protokolls durchgetuhrt wird. 

22. Verfahren nach Anspruch 12 bei Stufe c), wobei die Amplifikation unter Venwenden eines Hei3-Start Protokolls 
initiiert wird. 

23. Verfahren nach Anspruch 13. wobei die In-Phase-Perfekt-Verbindungs-Einfach-Sequenz-Wiederholung ausge- 
wahlt ist aus der Gruppe aus: 

5'(AC)„(AT)„ 3* 
(CA)„(TA)„ 
(AT)„(GT)„ 
(TA)n(TG)„ 
(TA)n(CA)„ 
(AT)„{AC)„ 
(TG)„(TA)„ 
(GT)n(AT)„ 
(TA)n(GA)„ 
(AT)n(AG)„ 
(TC)n(TA)„ 
(CT)n(AT)„ 
(AC)„(AG)„ 
(CA)„(GA)„ 
(CT)n(GT)„ 
(TC)n(TG)„ 
(TG),(AG)„ 
(GT)„(GA)„ 
(CT)n(CA)„ 
(CA),(CT)„ 
(AG)n(TG), 
(GA),(CT), 
(CA)„(CT)„ 
5' (AC)„(TC)„ 3' 
wobei n unabhangig 2 bis 15 ist. 

24. Verfahren nach Anspruch 23, wobei der Wert von n fur das 5" Wiederholungsdinucleotid grower als der Wert von 
n fur das 3' Wiederholungsdinucleotid ist. 

25. Verfahren zum Nachweisen von Polymorphismen zwischen zwei Proben von Nuclelnsaure, umfassend separates 
Behandein jeder Nucleinsaurenprobe gemaB den Stufen a-d: 

a) Verdauen der Nuclelnsaure mrt mindestens einem Restriktionsenzym, wodurch Restriktionsfragmente er- 
zeugt werden; 

b) Ligasieren von Adaptorsegmenten an die Enden der Restriktionsfragmente der Stufe a); 

c) Amplifizieren der Fragmente der Stufe b) unter Venwenden von Primer gerichteter Amplifikation. wobei die 
Amplifikationsprimer einen ersten Primer, bestehend aus einer Einfach-Sequenz-Wiedertiolungs-Region an 
dem 3' Ende und einer degenerierten Nucleotidregion an dem 5' Ende; und einen zweiten Primer, umfassend 
eine Sequenz, welche komplementar zu einem Adaptorsegment der Stufe b) ist. umfassen; und 

d) Vergleichen der amplifizierten Nucleinsaureprodukte der Stufe c) von jeder Nucleinsaurenprobe zum Nach- 
weisen von Unterschieden. 
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26, Verfahren nach Anspruch 25 bei Stufe c). wobei der erste Primer durch Formel V beschrieben wird: 

5* (degeneriertes Nucleotid)^(XY)„ 3* 



wobei: 

X A. C, T Oder G 1st; 

Y A, C. T Oder G ist; 
X;tY; 

r 2 bis 6 ist; und 
n2bis 15 ist 

27. Verfahren nach Anspruch 25 bei Stufe c). wobei der erste Primer durch Formel VI beschrieben wird, 

5' (degeneriertes Nucleotid)^(XYZ)^ 3' 

wobei: 

X A. C, T Oder G ist; 

Y A, C, T Oder G ist; 
Z A, C, T Oder G ist; 

X, Y und Z nicht alle gleich sind; 
r 2 bis 6 ist; und 
m 2 bis 10 ist. 



Revendlcations 

1. Un proc6d6 am6lior6 de d6tection de polymorph ismes entre deux 6chantillons individuels d'acides nucl6lques 
comprenant I'amplification des segments d'acide nucl6ique de chaque 6chantillon en utilisant une amplification 
orient6e par une amorce et en comparant lesdits segments amplifies pour d6tecter les diff6rences. Tamdlioration 
comprenant qu'au moins une des amorces utilis6es dans ladite amplification consiste en une r6p6tition parfaite 
de sequence simple composite dans laquelle deux s6quences r6p6titives ne sont pas s6par6es par plus de trois 
bases intermedial res. 

2. Le proc6d6 selon la revendication 1 . dans lequel ladite amorce r6p6titive de sequence simple composite parfaite 
est d^crite par la formula I : 

5' (XY)„(MN)„ 3* I 

dans laquelle : 

n vaut ind6pendamment de 2 ^ 1 5 ; 
X repr6sente A, C, T ou G ; 
Y repr6sente A, C. T ou G ; 
N reprdsente A, C, T ou G ; 
M repr6sente A, G, T ou G ; 

pourvu que : 

N 9^ M ; et 
XY^NM. 



84 



EP 0 804 618 B1 



3. Le proc6d6 selon ta revendication 1 , dans lequel n vaut ind6pendamment de 4 ^ 6. 

4, Le proc6d6 selon la revendication 1. dans lequel ladite amorce r6p6titive k sequence simple composite parlaite 
est d6crite par la formule II : 

5 

5" (XYZ)^(LMP)^ 3' N 



dans laquelte : 



10 



m vaut ind6pendamment de 2 ^ 10 ; 
X repr6sente A, C, T ou G; 
Y repr6sente A, C, T ou G ; 
Z repr6sente A. C, T ou G ; 
'5 M repr6sente A, C, T ou G; 

N repr6sente A, C, T ou G ; 
P repr6sente A, C, T ou G ; 



20 



pourvu que : 

X, Y et Z ne soient pas tous les mdmes ; 
L, M et P ne soient pas tous les m^mes ; 

et 

XYZ ^ NMR 

5. Le proc6d6 seion la revendication 4, dans lequel m vaut ind6pendamment de 2 ^ 4. 

3^ 6. Le proc6d6 selon la revendication 1 , dans lequel ladite amorce r6p6titive d sequence simple composite parfaite 
est en phase, d^crite par ta formule III ou IV : 



25 



35 



40 



45 



50 



5S 



5' {XY)„(XZ)„ 3* III 
5* (YX)„(ZX)„ 3' IV 

dans lesquelies : 

n vaut ind§pendamment de 2 ^ 1 5 ; 
X repr6sente A, C, T ou G; 
Y repr6sente A, C, T ou G; 
Z repr6sente A, C, T ou G; 

pourvu que : 

X^tY; 
Z 9t X ; et 
Y^tz. 

7. Le proc6d6 selon la revendication 6, dans lequel n vaut ind6pendamment de 4 ^ 8. 

8. Le proc6d6 selon la revendication 6, dans lequel la valeur de n pour les dinucl6otides r6p6titits 5' est sup6rieure 
^ la valeur de n pour le dinucl6otide r^pdtitif 3V 

9. Le proc6d6 selon la revendication 6, dans lequel ladite amorce r6p6titive k s6quence simple composite parfaite 
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en phase est choisie dans le groups consistant en : 
5' (ACUAT)„ 3' 
(CA)„(TA)„ 
(AT)n(GT)„ 

s (TA)„(TG)„ 
(TA)„(CA)„ 
(AT),(AC)„ 
(TG)n(TA)„ 
(GT)„(AT)„ 

10 (TA)n{GA)„ 

(AT)„(AG)„ 

(TC)n(TA)„ 

(CT),(AT)„ 

{AC)„{AG), 
IS (CA)„(GA)„ 

(CT)n(GT)„ 

(TC)„(TG)„ 

(TG)n(AG)„ 

(GT)n(GA)„ 
20 {CT)„(CA)n 

(TC)n(AC)„ 

(AG)„(TG)n 

(GA)„(GT)„ 

(CA)„(CT)„ 
25 5- (AC)n(TC)„ 3' 

dans lequel n est ind6pendamment 6gal ^2^15. 

10. Le proc§d6 selon la revendication 1 dans lequel ladlte amplification orient§e par amorce est mise en oeuvre en 
utilisant une amorce unique consistant en un motif r6p6titif ^ sequence simple composite parfait. 

30 

11. Le proc6d6 selon la revendication 1, dans lequel ledit motif r6p6tttif k s6quence simple composite parfait est en 
phase. 

12. Un proc6d6 de detection de polymorph ismes entre deux 6chantillons d'acide nucl6ique comprenant le traitement 
35 s6par6ment de chaque 6chantillon d'acide nucl6ique selon les 6tapes ah6 \ 

a) digestion de I'acide nucl6ique en presence d'au moins une enzyme de restriction, ce qui fait apparaTtre des 
fragments de restriction ; 

b) ligation de segments adaptateurs aux extr6mit6s des fragments de restriction de l'6tape a) ; 

40 c) amplification des fragments de r6tape b) en utilisant une amplification orient6e par amorce dans laquelle 

les amorces d'amplification comprennent une premiere amorce consistant en un motif r6p6tit(f k sdquence 
simple composite parfait telle que d6finie dans la revendication 1 , et une seconde amorce comprenant une 
sequence qui est compi6mentaire d'un segment adaptateur de I'dtape b) ; et 

d) comparaison des produits d'acide nucl6ique amplifi6 de !'6tape c) provenantde chaque 6chantillon d'acide 
46 nucl^ique pour en ddtecter les differences. 

13. Le proc6d6 selon la revendication 12 dans l'6tape c), dans lequel ladite premiere amorce est constitu6e d'un motif 
r6p6titif en sequence simple composite parfait qui est en phase. 

50 14. Le proc6d6 selon la revendication 12 dans l'6tape c), dans lequel ladite seconde amorce comprend en outre, k 
I'extr6mit6 3,' de 1 ^ 10 nucleotides arbitraires. 

15. Le proc6d6 selon ta revendication 12 dans l'6tape a), dans lequel deux enzymes de restriction diff6 rentes sont 
utilisdes pour dig^rer ledit acide nucl6ique. une enzyme de restriction reconnaissant un site t6tranucl6otidique sur 
55 l'6chantillon d'acide nucl6ique et I'autre enzyme de restriction reconnaissant un site hexanucl6otidique sur r6chan- 

tillon d'acide nucl6ique ; et ensuite, dans lequel, h I'dtape b). deux segments d'adaptateur diff6rents font I'objet 
d'une ligation sur des fragments de restriction engendr6s k I'dtape a). 
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16. Le proc6d6 selon la revendtcation 15, dans lequel. k r6tape b), I'un des deux segments adaptateurs porte un 
dl6ment d'une paire de liaisons. 

17. Le prcx;6d6 selon la revendication 16, dans lequel ledit 6l6ment d'une paire de liaisons est la biotine. 

5 

18. Le proc6d6 selon la revendication 16, comprenant en outre une 6tape additionnelle mise en oeuvre aprds r6tape 
b): 

b) (i) separation des fragments de l'6tape b) qui portent un 6l6ment d'une paire de liaisons provenant des 
TO fragments de r6tape b) qui ne portent pas un 6l6ment d'une paire de liaisons ; et 

en outre, k \*6\ape c), dans laquelle seuls les fragments de l'6tape b) (i) qui portent un 6l§ment de la paire de liaison 
sont amplifies conform6ment k I'dtape (c). 

IS 19. Le proc6d6 de la revendication 12, dans l'6tape c). dans lequel ladite premiere amorce porte une molecule mar- 
queur. 

20. Le proc6d6 de la revendication 1 9. dans lequel ledit marqueur est 32p ©u ^^R 

20 21. Le proc6d6 de la revendication 12, dans I'Stape c), dans leque! ladite amplification est mise en oeuvre en utilisant 
un protocole de cycle thermique avec point bas. 

22. Le proc6d6 de la revendication 12, dans l'6tape c), dans lequel ladite amplification est initi6e en utilisant un pro- 
tocole avec ddmarrage k chaud. 

25 

23. Le proc6d6 de la revendication 13. dans lequel ledit motif r6p6titrf de !a s6quence simple composite parfait en 
phase est choisie dans le groupe consistant en : 

5* (AC)„(AT)n 3- 
(CA)„(TA)„ 

30 {AT)„{GT)„ 
(TA),(TG)„ 
(TA)„(CA)„ 
(AT)n(AC), 
(TG)n(TA)„ 

35 (GT),(AT)„ 
(TA)„(GA)„ 
(AT)n(AG)„ 
(TC)n(TA)„ 
(CT),(AT)„ 

40 (AC)„(AG), 
(CA),(GA)„ 
(CT)n(GT)„ 

(TC),(TG)n 

(TG),(AG)n 
45 (GT)„(GA)„ 

(CT),(CA)„ 

(OA)„(CT)„ 

(AG)„(TG), 

(GA)„(CT), 
50 (CA)„(CT)„ 
5' (AG)„(TC)„ 3' 
dans lequel n est inddpendamment 6gal ^2^15. 

24. Le proc6d§ selon la revendication 23, dans lequel la valeur de n pour ledit nucleotide r6p6titif en 5' est sup6rieure 
ss ^ ta valeur de n pour ledit nucleotide repetitif en 3'. 



25. Un proc6d6 de detection de polymorphismes entre deux echantillons d'acide nucieique comprenant traitement 
separ6ment de chaque echantillon d'acide nucieique conformement aux etapes a ^ d : 
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a) digestion de I'acide nucl6ique en presence d'au moins una enzyme de restriction, ce qui fait apparaitre des 
fragments de restriction ; 

b) ligation de segments adaptateurs aux extr6mit6s des fragments de restriction de r6tape a) ; 

c) amplification des fragments de r6tape b) en utilisant une amplification orient6e par amorce dans taquetle 
5 les amorces d'amplification comprennent une premidre amorce consistent en une region r6p6trttve en sequen- 
ce simple k rextr§mit6 3' et une region de nucleotide d6g6n6r6e k rextr6mit6 5'; et une seconde amorce 
comprenant une sequence qui est compI6mentaire d'un segment adaptateur de r6tape b) ; et 

d) comparaison des produits d'acide nucl§ique amplifi6 de l'6tape c) provenant de chaque 6chantillon d'acide 
nucl6ique pour en d§tecter les differences. 

10 

26. Le proc6d6 selon la revendication 25 k I'etape c), dans lequel ladite premiere amorce est d6crite par la formule V : 

5' (nucleotide d6genere)^(XY)„ 3' V 

IS 

dans laquelle : 

X represente A, C, T ou G ; 

Y represente A, C, T ou G ; 

r vaut de 2 ^ 6 ; et 
n vaut de 2 ^ 1 5. 

27. Le procede selon la revendication 25 k I'etape c), dans lequel ladite premiere amorce est decrite par la formule VI : 

25 

5' (nucleotide d6g6nere)^(XYZ)^ 3' VI 

dans laquelle : 

30 

X represente A. C, T ou G ; 

Y represente A, C, T ou G ; 
Z represente A, C, T ou G ; 

X, Y et Z ne sont pas tous identiques ; 
r vaut de 2 ^ 6 ; et 
mvaut de2^ 10. 



40 



45 



50 
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FIG.Ib 




FIG. 1c 
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PRIMER: 

C AT)2 . 5(GT)6 . 5 TATATATGTGTGTGTGTGTG — ► 

(AT)6 5(GT)4 5 TATATATATATATGTGTGTGTG 

(AT)8 '5(01)3 [5 TATATATATATATATATGTGTG ► 
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O > 
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(CA)6.5CTA)4"5 TATATATATACACACACACACA 

CCA)4". sCTA)/, 5 '* TATATACACACACACACACA 
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5 CROSS REFERENCE TO RELATED APPLICATIONS 

This application claims priority under 35 U.S.C. §1 19(e) to U.S. Provisional 
Application Serial No. 60/096,271, and U.S. Provisional Application Serial No. 60/130,810, 
by Joseph A. Affholter, filed on August 12, 1998 and April 23, 1999, respectively. This 
application is related to the copending application titled DNA SHUFFLING OF 

10 DIOXYGENASE GENES FOR PRODUCTION OF INDUSTRIAL CHEMICALS by 
Sergey A. Selifonov, Attorney Docket No. 01 8097-03 1 1 OOUS, filed on an even day 
herewith. This application is also related to U.S. Provisional Application Serial No. 
60/096,28, filed August 12, 1998, U.S. Provisional Application Serial No. 60/1 1 1,146, filed 
December 7, 1998, U.S. Provisional Application Serial No. 60/1 12,746, filed December 17, 

15 1998. The disclosures of each the above-referenced applications are incorporated herein by 
reference in their entirety for all purposes. 

FIELD OF THE INVENTION 

This invention pertains to the shuffling of nucleic acids to achieve or enhance 
20 industrial production of chemicals by monooxygenase genes. 

BACKGROUND OF THE INVENTION 

Organic acids, alcohols, aldehydes and epoxides are important classes of 
industrial chemicals. Typically, these products are generated by successive oxidation of 
25 inexpensive, high volume saturated and unsaturated hydrocarbons (ethane, propane, butane, 
etc. and ethene, propene, butene, etc.) and simple aromatics such as benzene, ethyl oenzene, 
naphthalene, styrene and toluene. 

Monooxygenases (MOs) such as the P450 oxygenases, heme-dependent 
peroxidases, iron-sulfur MOs and quinone-dependent MOs typically catalyze limited 
30 oxidation of these basic chemical building blocks. While potentially interesting from an 
industrial standpoint, these enzymes typically exhibit neither the physical robustness nor 
sufficient turnover numbers to make them usable as industrial catalysts. In addition, 
regeneration of a reduced heme is required following each catalytic turnover. Biologically, 
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the necessary heme reduction is mediated in the P450 family of enzymes by NAD(P)H, an 
expensive and impractical redox partner for most industrial chemistries. 

Surprisingly, the present invention provides a method for providing enzymes 
with higher activity, high physical stability and robustness. Also surprisingly, the present 
5 invention provides a means of generating NADPH-independent monooxygenase activity in 
the presence of peroxide co-substrates (as well as other inexpensive cofactors) thereby 
^ solving each of the problems outlined above, as well as providing a variety of other features 

which will be apparent upon review. 

1 0 SUMMARY OF THE IINTVENTION 

In the present invention, DNA shuffling is used to generate new or improved 
monooxygenase genes. These monooxygenase genes are used to provide monooxygenase 
enzymes, especially for industrial processes. These new or improved genes have 
surprisingly superior properties as compared to naturally occurring monooxygenase genes. 
I 15 In the methods for obtaining monooxygenase genes, a plurality of parental 

forms (homologs) of a selected nucleic acid are recombined. The selected nucleic acid is 
derived either from one or more parental nucleic acid(s) which encodes a monooxygenase 
enzyme, or a fragment thereof, or from a parental nucleic acid which does not encode 
monooxygenase, but which is a candidate for DNA shuffling to develop monooxygenase 
20 activity. The plurality of forms of the selected nucleic acid differ from each other in at least 
one (and typically two or more) nucleotides, and, upon recombination, provide a library of 
recombinant monooxygenase nucleic acids. The library can be an in vitro set of molecules, 
or present in cells, phage or the like. The library is screened to identify at least one 
recombinant monooxygenase nucleic acid that exhibits distinct or improved monooxygenase 
25 activity compared to the parental nucleic acid or nucleic acids. 

Many formats for libraries of nucleic acids are known in the art and each of 
these formats is generally applicable to the libraries of the present invention. For example, 
basic texts generally disclosing library formats of use in this invention include Sambrook et 
al. Molecular Cloning, A Laboratory Manual (2nd ed. 1989); Kriegler, Gene Transfer and 
30 Expression: A Laboratory Manual (1990); and Current Protocols in Molecular Biology 
(Ausubel et a/., eds., 1994)), 

In a preferred embodiment, the st£irting DNA segments are first recombined 
by any of the formats described herein to generate a diverse library of recombinant DNA 
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segments. Such a library can vary widely in size from having fewer than 10 to more than 
10^ 10^ or lO' members. In general, the starting segments and the recombinant libraries 
generated include full-length coding sequences and any essential regulatory sequences, such 
as a promoter and polyadenylation sequence, required for expression. However, if this is not 
the case, the recombinant DNA segments in the library can be inserted into a common vector 
providing the missing sequences before performing screening/selection. 

If the sequence recombination format employed is an in vivo format, the 
library of recombinant DNA segments generated already exists in a cell, which is usually the 
cell type in which expression of the enzyme with altered substrate specificity is desired. If 
sequence recombination is performed in vitro, the recombinant library is preferably 
introduced into the desired cell type before screening/selection. The members of the 
recombinant library can be linked to an episome or virus before introduction or can be 
introduced directly. In some embodiments of the invention, the library is amplified in a first 
host, and is then recovered from that host and introduced to a second host more amenable to 
expression, selection, or screening, or any other desirable parameter. 

The manner in which the library is introduced into the cell type depends on 
the DNA-uptake characteristics of the cell type (e.g., having viral receptors, being capable of 
conjugation, or being naturally competent). If the cell type is not susceptible to natural and 
chemical-induced competence, but is susceptible to electroporation, one preferably employs 
electroporation. If the cell type is not susceptible to electroporation as well, one can employ 
biolistics. The biolistic PDS-1000 Gene Gun (Biorad, Hercules, Calif.) uses helium pressure 
to accelerate DNA-coated gold or tungsten microcarriers toward target cells. The process is 
applicable to a wide range of tissues, including plants, bacteria, ftmgi, algae, intact animal 
tissues, tissue culture cells, and animal embryos. One can employ electronic pulse delivery, 
which is essentially a mild electroporation format for live tissues in animals and patients. 
Zhao, Advanced Drug Delivery Reviews 17::57-262 (1995). Novel methods for making 
cells competent are described in co-pending application U.S. patent application Ser. No. 
08/621,430, filed Mar. 25, 1996. After introduction of the library of recombinant DNA 
genes, the cells are optionally propagated to allow expression of genes to occur. 

In selecting for monooxygenase activity, a candidate shuffled DNA can be 
tested for encoded monooxygenase activity in essentially any synthetic process. Common 
processes that can be screened include screening for alkane oxidation (e.g., hydroxylation, 
formation of ketones, aldehydes, etc.), screening for alkene epoxidation, aromatic 
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hydroxylation, N-dealkylation (e.g., of alkylamines), S-dealkylation (e.g., of reduced thio- 
organics), 0-dealkyIation (e.g,, of alkyl ethers), oxidation of aryloxy phenols, conversion of 
aldehydes to acids, alcohols to aldehydes or ketones, dehydrogenation, decarbonylation, 
'^'^^ oxidative dehalogenation of haloaromatics and halohydrocarbons, Baeyer-Villiger 

^ 5 monoxygenation, modification of cyclosporins, hydroxylation of mevastatin, hydroxylation 

of erythromycin, N-hydroxylation, sulfoxide formation, hydroxylation of fatty acids, 

i 

y| hydroxylation of terpenes or oxygenation of sulfonylureas. Other oxidative transformations 

will be apparent to those of skill in the art. 
^ Similarly, instead of, or in addition to, testing for an increase in 

10 monooxygenase specific activity, it is also desirable to screen for shuffled nucleic acids 
' ^ which produce higher levels of monooxygenase nucleic acid or enhanced or reduced 

recombinant monooxygenase polypeptide expression or stability encoded by the 
recombinant monooxygenase nucleic acid. 

A variety of screening methods can be used to screen a library, depending on 
^ 1 5 the monooxygenase activity for which the library is selected. By way of example, the library 

^ to be screened can be present in a population of cells. The library is selected by growing the 

cells in or on a medium comprising the chemical or compound to be oxidized or reduced and 
selecting for a detected physical difference between the oxidized or reduced form of the 
chemical or compound and the non-oxidized or reduced form of the chemical or compound, 
20 either in the cell, or the extracellular medium. 

Iterative selection for monooxygenase nucleic acids is also a feature of the 
invention. In these methods, a selected nucleic acid identified as encoding monooxygenase 
activity can be shuffled, either with the parental nucleic acids, or with other nucleic acids 
(e.g., mutated forms of the selected nucleic acid) to produce a second shuffled library. The 
25 second shuffled library is then selected for one or more form of monooxygenase activity, 
which can be the same or different than the monooxygenase activity previously se! : :ted. 
This process can be iteratively repeated as many times as desired, until a nucleic acid with 
I optimized properties is obtained. If desired, any monooxygenase nucleic acid identified by 

any of the methods herein can be cloned and, optionally, expressed. 
30 The invention also provides methods of increasing monooxygenase activity 

by whole genome shuffling. In these methods, a plurality of genomic nucleic acids are 
shuffled in a cell (in whole cell shuffling, entire genomes are shuffled, rather than specific 
sequences). The resulting shuffled nucleic acids are selected for one or more 
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monooxygenase traits. The genomic nucleic acids can be from a species or strain different 
from the cell in which monooxygenase activity is desired. Similarly, the shuffling reaction 
can be performed in cells using genomic DNA from the same or different species, or strains. 
Strains or enzymes exhibiting enhanced MO activity can be identified. 
5 The distinct or improved monooxygenase activity encoded by a nucleic acid 

identified after shuffling can encode one or more of a variety of properties, including: an 
increased ability to chemically modify the monooxygenase target, an increase in the range of 
monooxygenase substrates which the distinct or improved nucleic acid operates on, an 
increase in the chemoselectivity of a polypeptide encoded by the nucleic acid, an increase in 
10 the regioselectivity of a polypeptide encoded by .l.e nucleic acid, an increase in the 

stereoselectivity of a polypeptide encoded by the nucleic acid, an increased expression level 
of a polypeptide encoded by the nucleic acid, a decrease in susceptibility of a polypeptide 
encoded by the nucleic acid to protease cleavage, a decrease in susceptibility of a 
polypeptide encoded by the nucleic acid to high or low pH levels, a decrease in susceptibility 
15 of the protein encoded by the nucleic acid to high or low temperatures, a decrease in 

peroxide-mediated enzyme inactivation, a decrease in toxicity to a host cell of a polypeptide 
encoded by the selected nucleic acid, the ability to use low-cost reducing partners (rather 
than NAD(P)H), and a reduction in the sensitivity of the polypeptide and/or an organism 
expressing the polypeptide to inactivation by organic solvents and the feedstocks for and 
20 products of the enzymatic oxidations, and 

The selected nucleic acids to be shuffled can be from any of a variety of 
sources, including synthetic or cloned DNAs. Exemplary targets for recombination include 
nucleic'acids encoding P450 monooxygenases, nucleic acids encoding heme-dependent 
peroxidases, nucleic acids encoding iron sulfiir monooxygenases, nucleic acids encoding 
25 quinone-dependent monooxygenases, and the like. Typically, shuffled nucleic acids are 
cloned into exprer-.on vectors to achieve desired expression levels. 

In addition to shuffling monooxygenase nucleic acids, it is occasionally 
desirable to produce shuffled nucleic acids which produce oxidizing/reducing equivalents in 
forms other than O2. H^O^ and N ADPH, such as peroxides. Shuffled monooxygenase and 
30 oxidase (H.O^) nucleic acids can be co-expressed in a single system to provide both 
monooxygenase activity and peroxide in a single system. 

One feature of the invention is production of libraries and shuffling mixtures 
for use in the methods as set forth above. For example, a phage display library comprising 
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shuffled forms of a nucleic acid is provided. Similarly, a shuffling mixture comprising at 
least three homologous DNAs, each of which is derived from a nucleic acid encoding a 
polypeptide or polypeptide fragment is provided. These polypeptides can be, for example, 
P450 monooxygenases, heme-dependent peroxidases, iron sulfur monooxygenases, 
quinone-dependent monooxygenases, and the like. 

Isolated nucleic acids identified by selection of the libraries in the methods 
above are also a feature of the invention. 

BRIEF DESCRIPTION OF THE FIGURES 

Figure 1. Schematic showing ftmctional group insertion and modification 

using a monooxygenase. 

Figure 2. Structures of exemplary feedstock olefmic compounds and 

structures of a-hydroxycarboxylic acids. 

Figure 3. Enzymatic reaction schemes for multistep biochemical 

transformations of olefins to AHAs. 

Figure 4. Enzymatic reaction schemes for converting free AHAs to ester 

derivatives. 

Figure 5. Table of preferred MO reactions. 

The absolute configuration of the chiral centers is not indicated in these 
Figures. The chiral centers of the chiral compounds can be R, S, or a mixture of these 
configurations. 

DETAILED DESCRIPTION OF THE INVENTION AND 
THE PREFERRED EMBODIMENTS 

Abbreviations 

"AHA" refers to an a-hydroxycarboxylic acid. 

"HCA" refers to a hydroxylated aromatic carboxylic acid 

"MO" refers to a monooxygenase. 

Derinitions 

Unless clearly indicated to the contrary, the following definitions supplement 

definitions of terms known in the art. 

A "recombinant" nucleic acid is a nucleic acid produced by recombination 
between two or more nucleic acids, or any nucleic acid made by an in vitro or artificial 
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process. The term "recombinant" when used with reference to a cell indicates that the cell 
includes (and optionally replicates) a heterologous nucleic acid, or expresses a peptide or 
protein encoded by a heterologous nucleic acid. Recombinant cells can contain genes that 
are not found within the native (non-recombinant) form of the cell. Recombinant cells can 

t 5 also contain genes found in the native form of the cell where the genes are modified and re- 

introduced into the cell by artificial means. The term also encompasses cells that contain a 

I nucleic acid endogenous to the cell that has been artificially modified without removing the 

nucleic acid fi-om the cell; such modifications include those obtained by gene replacement, 
site-specific mutation, and related techniques. 

' 1 0 A "recombinant dioxygenase nucleic acid" is a recombinant nucleic acid 

-1 encoding a protein or RNA which confers dioxygenase activity to a cell when the nucleic 

acid is expressed in the cell. 

A "plurality of forms" of a selected nucleic acid refers to a plurality of 
homologs of the nucleic acid. The homologs can be from naturally occurring homologs 
p 1 5 {e.g. , two or more homologous genes) or by artificial synthesis of one or more nucleic acids 

i having related sequences, or by modification of one or more nucleic acid to produce related 

nucleic acids. Nucleic acids are homologous when they are derived, naturally or artificially, 
from a common ancestor sequence. During natural evolution, this occurs when two or more 
descendent sequences diverge firom a parent sequence over time, i.e., due to mutation and 
20 natural selection. Under artificial conditions, divergence occurs, e.g. , in one of two ways. 
First, a given sequence can be artificially recombined with another sequence, as occurs, e.g., 
during typical cloning, to produce a descendent nucleic acid. Alternatively, a nucleic acid 
can be synthesized de novo, by synthesizing a nucleic acid which varies in sequence fi-om a 
given parental nucleic acid sequence. 
25 When there is no explicit knowledge about tiie ancestry of two nucleic acids, 

homology is typical ly inferred by sequence comparison between two sequences. Where two 
nucleic acid sequences show sequence similarity it is inferred that the two nucleic acids 
% share a common ancestor. The precise level of sequence similarity required to establish 

homology varies in the art depending on a variety of factors. For purposes of this disclosure, 
30 two sequences are considered homologous where they share sufficient sequence identity to 
allow recombination to occur between two nucleic acid molecules. Typically, nucleic acids 
require regions of close similarity spaced roughly the same distance apart to permit 
recombination to occur. 

8 
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The terms "identical" or percent "identity," in the context of two or more 
nucleic acid or polypeptide sequences, refer to two or more sequences or subsequences that 
are the same or have a specified percentage of amino acid residues or nucleotides that are the 
same, when compared and aligned for maximum correspondence, as measured using one of 
5 the sequence comparison algorithms described below (or other algorithms available to 
.1 persons of skill) or by visual inspection. 

i The phrase "substantially identical," in the context of two nucleic acids or 

polypeptides {e.g., DNAs encoding a dioxygenase, or the amino acid sequence of the 
dioxygenase) refers to two or more sequences or subsequences that have at least about 60%, 
10 preferably 80%, most preferably 90-95% nucleotide or amino acid residue identity, when 
compared and aligned for maximum correspondence, as measured using one of the following 
sequence comparison algorithms or by visual inspection. Such "substantially identical" 
sequences are typically considered to be homologous. Preferably, the "substantial identity" 
exists over a region of the sequences that is at least about 50 residues in length, more 
I 15 preferably over a region of at least about 1 00 residues, and most preferably the sequences are 

substantially identical over at least about 1 50 residues, or over the foil length of the two 
sequences to be compared. 

For sequence comparison and homology determination, typically one 
sequence acts as a reference sequence to which test sequences are compared. When using a 
20 sequence comparison algorithm, test and reference sequences are input into a computer, 
subsequence coordinates are designated, if necessary, and sequence algorithm program 
parameters are designated. The sequence comparison algorithm then calculates the percent 
sequence identity for the test sequence(s) relative to the reference sequence, based on the 
designated program parameters. 
25 Optimal alignment of sequences for comparison can be conducted, e.g. , by 

the local homology algorithm of Smith & Waterman, Adv. Appl. Math. 2:482 (1981), by the 
homology alignment algorithm of Needleman & Wunsch, J. Mol. Biol 48:443 (1970), by the 
search for similarity method of Pearson & Lipman, Proc. Natl Acad Set USA 85:2444 
(1988), by computerized implementations of these algorithms (GAP, BESTFIT, FASTA, 
30 and TFASTA in the Wisconsin Genetics Software Package, Genetics Computer Group, 575 
Science Dr., Madison, WI), or by visual inspection {see generally, Ausubel et ai, infra). 

One example of an algorithm that is suitable for determining percent 
sequence identity and sequence similarity is the BLAST algorithm, which is described in 
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Altschul et ai. J. Mol. Biol. 215:403-410 (1990). Software for performing BLAST analyses 
is publicly available through the National Center for Biotechnology Information 
(http://www.ncbi.nlra.nih.gov/). This algorithm involves first identifying high scoring 
sequence pairs (HSPs) by identifying short words of length W in the query sequence, which 
either match or satisfy some positive-valued threshold score T when aligned with a word of 
the same length in a database sequence. T is referred to as the neighborhood word score 
threshold (Altschul et ai, supra). These initial neighborhood word hits act as seeds for 
initiating searches to find longer HSPs containing them. The word hits are then extended in 
both directions along each sequence for as far as the cumulative alignment score can be 
increased. Cumulative scores are calculated using, for nucleotide sequences, the parameters 
M (reward score for a pair of matching residues; always > 0) and N (penalty score for 
mismatching residues; always < 0). For amino acid sequences, a scoring matrix is used to 
calculate the cumulative score. Extension of the word hits in each direction are halted when: 
the cumulative alignment score falls off by the quantity X from its maximum achieved 
value; the cumulative score goes to zero or below, due to the accumulation of one or more 
negative-scoring residue alignments; or the end of either sequence is reached. The BLAST 
algorithm parameters W, T, and X determine the sensitivity and speed of the alignment. The 
BLASTN program (for nucleotide sequences) uses as defaults a wordlength (W) of 11, an 
expectation (E) of 10, a cutoff of 100, M=5, N=-4, and a comparison of both strands. For 
amino acid sequences, the BLAST? program uses as defaults a wordlength (W) of 3, an 
expectation (E) of 10, and the BLOSUM62 scoring matrix {see Henikoff & Henikoff (1989) 
Proc. Natl. Acad. Sci. USA 89:10915). 

In addition to calculating percent sequence identity, the BLAST algorithm 
also performs a statistical analysis of the similarity between two sequences (see, e.g., Karlin 
& Altschul, Proc. Natl Acad. Sci. USA 90:5873-5787 (1993)). One measure of similarity 
provided by the BLAST algorithm is the smallest sum probability (P(N)), which provides an 
indication of the probability by which a match between two nucleotide or amino acid 
sequences would occur by chance. For example, a nucleic acid is considered similar to a 
reference sequence if the smallest sum probability in a comparison of the test nucleic acid to 
the reference nucleic acid is less than about 0.1, more preferably less than about 0.01, and 

most preferably less than about 0.001. 

Another indication that two nucleic acid sequences are substantially identical/ 
homologous is that the .wo molecules hybridize to each other under stringent conditions. 
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The phrase ''hybridizing specifically to," refers to the binding, duplexing, or hybridizing of a 
molecule only to a particular nucleotide sequence under stringent conditions, including when 
that sequence is present in a complex mixture (e,g„ total cellular) DNA or RNA. "Bind(s) 
substantially" refers to complementary hybridization between a probe nucleic acid and a 
f 5 target nucleic acid and embraces minor mismatches that can be accommodated by reducing 

the stringency of the hybridization media to achieve the desired detection of the target 

I 

I polynucleotide sequence. 

"Stringent hybridization conditions" and "stringent hybridization wash 
conditions" in the context of nucleic acid hybridization experiments such as Southern and 
1 0 northern hybridizations are sequence dependent, and are different under different 

-= environmental parameters. Longer sequences hybridize specifically at higher temperatures. 

An extensive guide to the hybridization of nucleic acids is found in Tijssen LABORATORY 
Techniques in Biochemistry and Molecular Biology-Hybridization with Nucleic 
Acid Probes part I chapter 2 (1993) "Overview of principles of hybridization and the 

I 15 strategy of nucleic acid probe assays," Elsevier, New York. Generally, highly stringent 

hybridization and wash conditions are selected to be about 5 ''C lower than the thermal 
melting point (Tm) for the specific sequence at a defined ionic strength and pH. Typically, 
under "stringent conditions" a probe will hybridize to its target subsequence, but not to 
unrelated sequences. 

20 The Tm is the temperature (under defined ionic strength and pH) at which 

50% of the target sequence hybridizes to a perfectly matched probe. Very stringent 
conditions are selected to be equal to the T^ for a particular probe. An example of stringent 
hybridization conditions for hybridization of complementary nucleic acids which have more 
than 100 complementary residues on a filter in a Southern or northern blot is 50% 
25 formamide with 1 mg of heparin at 42 °C, with the hybridization being carried out overnight. 
An example of highly stringent wash conditions is 0.1 5M NaCl at 72 X for about 15 
minutes. An example of stringent wash conditions is a 0.2x SSC wash at 65 °C for 1 5 
^ minutes {see, Sambrook, infra., for a description of SSC buffer). Often, a high stringency 

wash is preceded by a low stringency wash to remove background probe signal. An example 
30 medium stringency wash for a duplex of, e.g.. more than 100 nucleotides, is Ix SSC at 45°C 
for 15 minutes. An example low stringency wash for a duplex of, e.g., more than 100 
nucleotides, is 4-6x SSC at 40 ^C for 15 minutes. For short probes (e.g., about 10 to 50 
nucleotides), stringent conditions typically involve salt concentrations of less than about 1.0 

11 



-"OCID' <rWO 0009682A1 I > 



wo 00/09682 PCT/US99/18424 

M Na ion, typically about 0,01 to 1 .0 M Na ion concentration (or other salts) at pH 7.0 to 
8.3, and the temperature is typically at least about 30 °C. Stringent conditions can also be 
achieved with the addition of destabilizing agents such as formamide. In general, a signal to 
noise ratio of 2x (or higher) than that observed for an unrelated probe in the particular 
hybridization assay indicates detection of a specific hybridization. Nucleic acids which do 
not hybridize to each other under stringent conditions are still substantially identical if the 
polypeptides which they encode are substantially identical. This occurs, e.g., when a copy of 
a nucleic acid is created using the maximum codon degeneracy permitted by the genetic 
code, 

A further indication that two nucleic acid sequences or polypeptides are 
substantially identical/homologous is that the polypeptide encoded by the first nucleic acid is 
immunologically cross reactive with, or specifically binds to, the polypeptide encoded by the 
second nucleic acid. Thus, a polypeptide is typically substantially identical to a second 
polypeptide, for example, where the two peptides differ only by conservative substitutions. 

"Conservatively modified variations" of a particular polynucleotide sequence 
refers to those polynucleotides that encode identical or essentially identical amino acid 
sequences, or where the polynucleotide does not encode an amino acid sequence, to 
essentially identical sequences. Because of the degeneracy of the genetic code, a large 
number of functionally identical nucleic acids encode any given polypeptide. For instance, 
the codons CGU, CGC, CGA, COG, AGA, and AGG all encode the amino acid arginine. 
Thus, at every position where an arginine is specified by a codon, the codon can be altered to 
any of the corresponding codons described without altering the encoded polypeptide. Such 
nucleic acid variations are "silent variations," which are one species of "conservatively 
modified variations." Every polynucleotide sequence described herein which encodes a 
polypeptide also describes every possible silent variation, except where otherwise noted. 
One of skill will recognise that each codon in a nucleic acid (except AUG, which is 
ordinarily the only codon for methionine) can be modified to yield a fianctionally identical 
molecule by standard techniques. Accordingly, each "silent variation" of a nucleic acid 
which encodes a polypeptide is implicit in each described sequence. 

Furthermore, one of skill will recognize that individual substitutions, 
deletions or additions which alter, add or delete a single amino acid or a small percentage of 
amino acids (typically less tiian 5%, more typically less than 1%) in an encoded sequence are 
"conservatively modified variations" where tiie alterations result in the substitution of an 
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amino acid with a chemically similar amino acid. Conservative substitution tables providing 
functionally similar amino acids are well known in the art. The following five groups each 
contain amino acids that are conservative substitutions for one another: 

Aliphatic : Glycine (G), Alanine (A), Valine (V), Leucine (L), Isoleucine (I); 
5 Aromatic : Phenylalanine (F), Tyrosine (Y), Tryptophan (W); Sulfur-containing : 

Methionine (M), Cysteine (C); Basic : Arginine (R), Lysine (K), Histidine (H); Acidic : 
Aspartic acid (D), Glutamic acid (E), Asparagine (N), Glutamine (Q). See also, Creighton 
(1984) Proteins, W.H. Freeman and Company. In addition, individual substitutions, 
deletions or additions which alter, add or delete a single amino acid or a small percentage of 
10 amino acids in an encoded sequence are also "conservatively modified variations." 
Sequences that differ by conservative variations are generally homologous. 

A "subsequence" refers to a sequence of nucleic acids or amino acids that 
comprise a part of a longer sequence of nucleic acids or amino acids {e.g., polypeptide) 
respectively. 

1 5 The term "gene" is used broadly to refer to any segment of DN A associated 

with expression of a given RNA or protein. Thus, genes include regions encoding expressed 
RNAs (which typically include polypeptide coding sequences) and, often, the regulatory 
sequences required for their expression. Genes can be obtained from a variety of sources, 
including cloning from a source of interest or synthesizing from knovm or predicted 

20 sequence information, and may include sequences designed to have desired parameters. 

The term "isolated", when applied to a nucleic acid or protein, denotes that 
the nucleic acid or protein is essentially free of other cellular components with which it is 
associated in the natural state. 

The term "nucleic acid" refers to deoxyribonucleotides or ribonucleotides and 

25 polymers thereof in either single- or double-stranded form. Unless specifically limited, the 
term encompasses nucl-ic acids containing kr^wn analogues of natural nucleotides which 
have similar binding properties as the reference nucleic acid and are metabolized in a manner 
similar to naturally occurring nucleotides. Unless otherwise indicated, a particular nucleic 
acid sequence also implicitly encompasses conservatively modified variants thereof {e.g. 

30 degenerate codon substitutions) and complementary sequences and as well as the sequence 
explicitly indicated. Specifically, degenerate codon substitutions may be achieved by 
generating sequences in which the third position of one or more selected (or all) codons is 
substituted with mixed-base and/or deoxyinosine residues (Batzer et ai, Nucleic Acid Res, 
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19:5081 (1991); Ohtsuka et ai. J, Biol. Chem. 260:2605-2608 (1985); Cassol et al (1992) ; 
Rossolini et aL Mol Cell Probes 8:91-98 (1994)). The term nucleic acid is generic to the 
terms "gene", "DNA," "cDNA", "oligonucleotide," "RNA," "mRNA," "polynucleotide" and 

;i the like. 

5 "Nucleic acid derived from a gene" refers to a nucleic acid for whose 

synthesis the gene, or a subsequence thereof, has ultimately served as a template. Thus, an 
mRNA, a cDNA reverse transcribed from an mRNA, an RNA transcribed from that cDNA, a 
DNA amplified from the cDNA, an RNA transcribed from the amplified DNA, etc. , are all 
derived from the gene and detection of such derived products is indicative of the presence 
10 and/or abundance of the original gene and/or gene transcript in a sample. 

A nucleic acid is "operably linked" when it is placed into a ftmctional 
relationship with another nucleic acid sequence. For instance, a promoter or enhancer is 
operably linked to a coding sequence if it increases the transcription of the coding sequence. 

A "recombinant expression cassette" or simply an "expression cassette" is a 
1 5 nucleic acid construct, generated recombinantly or synthetically, with nucleic acid elements 
that are capable of effecting expression of a structural gene in hosts compatible with such 
sequences. Expression cassettes include at least promoters and optionally, transcription 
termination signals. Typically, the recombinant expression cassette includes a nucleic acid 
to be transcribed (e.g., a nucleic acid encoding a desired polypeptide), and a promoter. 
20 Additional factors necessary or helpful in effecting expression may also be used as described 
herein. For example, an expression cassette can also include nucleotide sequences that 
encode a signal sequence that directs secretion of an expressed protein from the host cell. 
Transcription termination signals, enhancers, and other nucleic acid sequences that influence 
gene expression, can also be included in an expression cassette. 
25 The term "NAD(P)H" is used herein to refer to the reducing agents, NADH 

and NADPH. 

"Regioselectivity" is used herein to refer to the ability to discriminate 
between different positions of the monooxygenase target. 

"Chemoselectivity" is used herein to refer to the ability to discriminate 
30 between two or more potential sites of action in the monooxygenase target (e.g. alkyl 
hydroxylation in the presence of an epoxide and the like). 

"Stereoselectivity" is used herein to refer to the ability to discriminate 
between enantiomeric sites in the monooxygenase target. 
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"Alkyl" refers to straight- and branched-chain, saturated and unsaturated 
hydrocarbons. "Lower alkyl", as used herein, refers to "alkyl" groups having from about 1 
to about 6 carbon atoms. 

"Substituted alkyl" refers to alkyl as just described including one or more 
4 5 functional groups such as lower alkyl, aryl, acyl, halogen (/. e. , alkylhalos, e,g. , CF3), 

hydroxy, amino, alkoxy, alkylamino, acylamino, acyloxy, aryloxy, aryloxyalkyl, mercapto, 
I both saturated and unsaturated cyclic hydrocarbons, heterocycles and the like. These groups 

may be attached to any carbon of the alkyl moiety. 

The term "aryl" is used herein to refer to an aromatic substituent which may 
10 be a single aromatic ring or multiple aromatic rings which are fused together, linked 
covalently, or linked to a common group such as a methylene or ethylene moiety. The 
common linking group may also be a carbonyl as in benzophenone. The aromatic ring(s) 
may include phenyl, napthyl, biphenyl, diphenylmethyl and benzophenone among others. 
The term "aryl" encompasses "arylalkyl." 
1 5 The term "alkylarene" is used herein to refer to a subset of "aryl" in which the 

k aryl group is substituted with an alkyl group as defined herein. 

"Substituted aryl" refers to aryl as just described including one or more 
functional groups such as lower alkyl, acyl, halogen, alkylhalos (e.g. CF3), hydroxy, amino, 
alkoxy, alkylamino, acylamino, acyloxy, mercapto and both saturated and unsaturated cyclic 
20 hydrocarbons which are fused to the aromatic ring(s), linked covalently or linked to a 

common group such as a methylene or ethylene moiety. The linking group may also be a 
carbonyl such as in cyclohexyl phenyl ketone. The term "substituted aryl" encompasses 
"substituted arylalkyl." 

The term "acyl" is used to describe a ketone substituent, — C(0)R, wherein R 
25 is alkyl or substituted alkyl, aryl or substituted aryl as defined herein. 

The term "halogen" is used herein to refer to fluorine, bromine, chlorine and 

iodine atoms. 

I The term "hydroxy" is used herein to refer to the group — OH. 

The term "amino" is used to describe primary amines, R— NH2, wherein R is 
30 alkyl or substituted alkyl, aryl or substituted aryl as defined herein. 

The term "alkoxy" is used herein to refer to the —OR group, wherein R is a 
lower alkyl, substituted lower alkyl, aryl, substituted aryl, arylalkyl or substituted arylalkyl 
wherein the alkyl, aryl, substituted aryl, arylalkyl and substituted arylalkyl groups are as 
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described herein. Suitable alkoxy radicals include, for example, methoxy, elhoxy, phenoxy, 
substituted phenoxy, benzyloxy, phenethyloxy, t-butoxy, etc. 

The term "alkylamino" denotes secondary and tertiary amines wherein the 
alkyl groups may be either the same or different and may consist of straight or branched, 
^ 5 saturated or unsaturated hydrocarbons. 

The term "unsaturated cyclic hydrocarbon" is used to describe a non-aromatic 
I group with at least one double bond, such as cyclopentene, cyclohexene, etc. and substituted 

^ analogues thereof 

The term "heteroaryl" as used herein refers to aromatic rings in which one or 
4 10 more carbon atoms of the aromatic ring(s) are substituted by a heteroatom such as nitrogen, 

oxygen or sulfur. Heteroaryl refers to structures which may be a single aromatic ring, 
multiple aromatic ring(s), or one or more aromatic rings coupled to one or more non- 
aromatic ring(s). In structures having multiple rings, the rings can be fused together, linked 
covalently, or linked to a common group such as a methylene or ethylene moiety. The 
1 5 common linking group may also be a carbonyl as in phenyl pyridyl ketone. As used herein, 
I rings such as thiophene, pyridine, isoxazole, phthalimide, pyrazole, indole, fiiran, etc. or 

benzo-fused analogues of these rings are defined by the term "heteroaryl." 

"Alkylheteroaryl" defines a subset of "heteroaryl" substituted with an alkyl 

group, as defined herein. 

20 "Substituted heteroaryl" refers to heteroaryl as just described wherein the 

heteroaryl nucleus is substituted with one or more functional groups such as lower alkyl, 
acyl, halogen, alkylhalos {e.g. CF3), hydroxy, amino, alkoxy, alkylamino, acylamino, 
acyloxy, mercapto, etc. Thus, substituted analogues of heteroaromatic rings such as 
thiophene, pyridine, isoxazole, phthalimide, pyrazole, indole, fiiran, etc. or benzo-ftised 

25 analogues of these rings are defined by the term "substituted heteroaryl." 
^ The term "heterocyclic" is used herein to describe a saturated or unsaturated 

non-aromatic group having a single ring or multiple condensed rings from about 1 to about 
, 12 carbon atoms and from about 1 to about 4 heteroatoms selected from nitrogen, sulfur or 

oxygen within the ring. Such heterocycles are, for example, tetrahydroftiran, morpholine, 

30 piperidine, pyrrolidine, etc. 

The term "substituted heterocyclic" as used herein describes a subset of 
"heterocyclic" wherein the heterocycle nucleus is substituted with one or more functional 
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groups such as lower alkyl, acyl, halogen, alkylhalos (e.g. CF3), hydroxy, amino, alkoxy, 
alkylamino, acylamino, acyloxy, mercapto, etc. 

The term "alkyiheterocyclyl" defines a subset of "heterocyclic" substituted 
with an alkyl group, as defined herein. 
5 The term "substituted heterocyclicalkyl" defines a subset of "heterocyclic 

alkyi" wherein the heterocyclic nucleus is substituted with one or more functional groups 
such as lower alkyl, acyl, halogen, alkylhalos (e.g. CF3), hydroxy, amino, alkoxy, 
alkylamino, acylamino, acyloxy, mercapto, etc. 

10 IntroductiCA. 

This invention describes the generation of evolved monooxygenases with 
enhanced performance for use in the production of chemicals of industrial interest using any 
of a variety of shuffling techniques, including, for example, gene, family and whole genome 
shuffling as described herein. In this invention, shuffling is used to enhance properties of 
1 5 monooxygenases, such as forward rate kinetics, substrate specificity, regioselectivity, 
chemoselectivity, stereoselectivity and affinity and also to decrease susceptibility of 
monooxygenases to reversible inhibitors and inactivation by solvents, starting materials and 
reaction products and intermediates generated during the catalytic cycle. 

While much of the discussion below deals explicitly with P450 
20 monooxygenases, this is for clarity of illustration. The discussion is representative of the 
chemistries and improvements which can be made to other usefiil monooxygenases, such as 
the structurally and functionally similar peroxidases and chlorperoxidases, as well as to the 
structurally unrelated iron-sulfiir methane monooxygenases and other enzymes noted herein 
using the gene and family shuffling methodologies described. 
25 In a first aspect, the present invention provides a method for obtaining a 

nucleic acid that encodes an improved polypeptide possessing monooxygenase activity. The 
improved polypeptide has at least one property improved over a naturally occurring 
monooxygenase polypeptide. The method includes: (a) creating a library of recombinant 
polynucleotides encoding a recombinant monooxygenase polypeptide; and (b) screening the 
30 library to identify a recombinant polynucleotide that encodes an improved recombinant 
monooxygenase polypeptide that has at least one property improved over the naturally 
occurring polypeptide. Also provided are nucleic acids produced by this method that encode 
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a monooxygenase polypeptide having at least one property improved over a naturally 
occurring monooxygenase polypeptide. 

In a preferred embodiment, the nucleic acid libraries of the invention are 
constructed by a method that includes shuffling a plurality of parental polynucleotides to 
5 produce one or more recombinant monooxygenase polynucleotide encoding the improved 
^ property. In another preferred embodiment, the polynucleotides are homologous. A deuiled 

j description of shuffling techniques is provided in Part A, hereinbelow. 

i In another embodiment, at least one of the parental polynucleotides is 

selected from polynucleotides that encode at least one monooxygenase activity and those 
'1 1 0 that do not encode at least one monooxygenase activity. Typically, the parental 

monooxygenase polynucleotide encodes a complete polypeptide or a polypeptide fragment 
selected from an arene monooxygenase or fragments thereof. 

In a preferred embodiment, the monooxygenase activity is a member selected 
from alkane oxidation (e.g., hydroxylation, formation of ketones, aldehydes, etc.), alkene 
15 epoxidation, aromatic hydroxylation, N-dealkylation (e.g., of alkylamines), S-dealkylation 
I (e.g., of reduced thio-organics), 0-dealkylation (e.g., of alkyl ethers), oxidation of aryloxy 

phenols, conversion of aldehydes to acids, alcohols to aldehydes or ketones, 
dehydrogenation, decarbonylation, oxidative dehalogenation of haloaromatics and 
halohydrocarbons, Baeyer-ViUiger monoxygenation, modification of cyclosporins, 
20 hydroxylation of mevastatin, hydroxylation of erythromycin, hydroxylations of fatty acids, 
hydroxylation/epoxidation of terpenes, N-hydroxylation, sulfoxide formation, or 
oxygenation of sulfonylureas. Other oxidative transformations will be apparent to »Uose of 
skill in the art. 

The invention provides significant advantages over previously used methods 
25 for optimization of monooxygenase genes. For example, DNA shuffling can result in 
^ optimization of a desirable property even in tiie absence of a detailed understanding of the 

mechanism by which the particular property is mediated. In addition, entirely new 
properties can be obtained upon shuffling of DNAs, i.e., shuffled DNAs can encode 
' polypeptides or RNAs with properties entirely absent in the parental DNAs which are 

30 shuffled. 

The properties or characteristics that can be acquired or improved vary 
widely, and depend on the choice of substrate. For example, for monooxygenase genes, 
properties that one can improve include, but are not limited to, increased range of 
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monooxygenases activity encoded by a particular gene, increased potency against a 
monooxygenase target, increased regioselectivity of action against a monooxygenase target, 
increased chemoselectivity of action against a monooxygenase target, increased 
stereoselectivity of action against a monooxygenase target, increased expression level of the 
5 monooxygenase gene, increased tolerance of the protein encoded by the monooxygenase 
gene to protease degradation (or other natural protein or RNA degradative processes), 
increased monooxygenase activity ranges for conditions such as heat, cold, low or high pH, 
reduced toxicity to the host cell, and increased resistance of the polypeptide and/or the 
organism expressing the polypeptide to organic solvents, and reaction feedstocks, 

1 0 intermediates and products. 

The targets for modification vary in different applications, as does the 
property sought to be acquired or improved. Examples of candidate targets for acquisition of 
a property or improvement in a property include genes that encode proteins which have 
enzymatic or other activities useful in monooxygenase reactions. 

15 The methods typically use at least two variant forms of a starting target. The 

variant forms of candidate substrates can show substantial sequence or secondary structural 
similarity with each other, but they should also differ in at least one and preferably at least 
two positions. 

The initial diversity between forms can be the result of natural variation, e.g., 
20 the different variant forms (homologs) are obtained from different individuals or strains of 
an organism, or constitute related sequences from the same organism (e.g., allelic 
variations), or constitute homologs from different organisms (interspecific variants). 
Alternatively, initial diversity can be induced, e,g,, the variant forms can be generated by 
error-prone transcription, such as an error-prone PGR or use of a polymerase which lacks 
25 proof-reading activity (see, Liao, Gene 88: 107-1 1 1 (1990)), of the first variant form, or, by 
replication of the first form in a mutator strain (mutator host cells are discussed in further 
detail below, and are generally well known). Alternatively, initial diversity can be generated 
by the creation of chimeric nucleic acids. The initial diversity between substrates is greatly 
augmented in subsequent steps of recombination for library generation. 
30 A mutator strain can include any mutants in any organism impaired in the 

functions of mismatch repair. These include mutant gene products of mutS, mutT, mutH, 
mutL, ovrD, dcm, vsr, umuC, umuD, sbcB, recJ, etc. The impairment is achieved by genetic 
mutation, allelic replacement, selective inhibition by an added reagent such as a small 
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molecule or an expressed antisense RNA, or other techniques. Impairment can be of the 
genes noted, or of homologous genes in any organism. 

Therefore, in carrying out the practice of the present invention, at least two 
variant forms of a nucleic acid which can confer monooxygenase activity are recombined to 
5 produce a library of recombinant monooxygenase genes. The library is then screened to 
identify at least one recombinant monooxygenase gene that is optimized for the particular 
property or properties of interest. 

The parental polynucleotides can be shuffled in substantially any cell type, 
including prokaryotes, eukaryotes, yeast, bacteria and fungi. In a preferred embodiment, the 
1 0 one or more recombinant monooxygenase nucleic acid is present in one or more bacterial, 
yeast, or fungal cells and the method includes: pooling multiple separate monooxygenase 
nucleic acids; screening the resulting pooled monooxygenase nucleic acids to identify a 
distinct or improved recombinant monooxygenase nucleic acids that exhibit distinct or 
improved monooxygenase activity compared to a non-recombinant monooxygenase activity 
1 5 nucleic acid; and cloning the distinct or improved recombinant nucleic acid. 

Often, improvements are achieved after one round of recombination and 
selection. However, recursive sequence recombination can be employed to achieve still 
fiirther improvements in a desired property, or to bring about new (or "distinct") properties. 
Recursive sequence recombination entails successive cycles of recombination to generate 
20 molecular diversity. That is, one creates a family of nucleic acid molecules showing some 
sequence identity to each other but differing in the presence of mutations. In any given 
cycle, recombination can occur in vivo or in vitro, intracellularly or extracellularly. 
Furthermore, diversity resulting from recombination can be augmented in any cycle by 
applying prior methods of mutagenesis {e.g., error-prone PGR or cassette mutagenesis) to 
25 either the substrates or products for recombination. 

A recombination cycle is usually followed by at least one cycle of screening 
or selection for molecules having a desired property or characteristic. If a recombination 
cycle is performed in vitro, the products of recombination, i.e., recombinant segments, are 
sometimes introduced into cells before the screening step. Recombinant segments can also 
30 be linked to an appropriate vector or other regulatory sequences before screening. 

Alternatively, products of recombination generated in vitro are sometimes packaged in 
viruses {e.g., bacteriophage) before screening. If recombination is performed in vivo, 
recombination proaucts can sometimes be screened in the cells in which recombination 

20 



BNSDOCID: <W0 0009682A1J_> 



wo 00/09682 



PCT/US99/18424 



occuired. In other applications, recombinant segments are extracted from thh cells, and 
optionally packaged as viruses, before screening. 

The nature of screening or selection depends on what property or 
characteristic is to be acquired or the property or characteristic for which improvement is 
5 sought, and many examples are discussed below. It is not usually necessary to understand 
the molecular basis by which particular products of recombination (recombinant segments) 

1^ have acquired new or improved properties or characteristics relative to the starting 

substrates. For example, a monooxygenase gene can have many component sequences each 
having a different intended role (e.g, , coding sequence, regulatory sequences, targeting 
10 sequences, stability-conferring sequences, subunit sequences and sequences affecting 
integration). Each of these component sequences can be varied and recombined 
simultaneously. Screening/selection can then be performed, for example, for recombinant 
segments that have increased ability to confer monooxygenase activity upon a cell without 
the need to attribute such improvement to any of the individual component sequences of the 

M 15 vector. 

^ Depending on the particular screening protocol used for a desired property, 

initial round(s) of screening can sometimes be performed using bacterial cells due to high 
transfection efficiencies and ease of culture. However, for eukaryotic monooxygenases such 
as eukaryotic arene monooxygenases, bacterial expression is often not practical, and yeast, 
20 fungal or other eukaryotic systems are used for library expression and screening. Similarly 
other types of screening which are not amenable to screening in bacterial or simple 
eukaryotic library cells, are performed in cells selected for use in an environment close to 
that of their intended use. Final rounds of screening can be performed in the precise cell 
type of intended use. 

25 If further improvement in a property is desired, at least one and usually a 

collection of recombinant segments surviving a first round of screening/selection are subject 
to a further round of recombination. These recombinant segments can be recombined with 
I each other or with exogenous segments representing the original substrates or ftirther 

variants thereof. Again, recombination can proceed in vitro or in vivo. If the previous 
30 screening step identifies desired recombinant segments as components of cells, the 

components can be subjected to fiirther recombination in vivo, or can be subjected to fiirther 
recombination in vitro, or can be isolated before performing a round of in vitro 
^ recombination. Conversely, if the previous screening step identifies desired recombinant 
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segments in naked form or as components of viruses, these segments can be introduced into 
cells to perform a round of m vivo recombination. The second round of recombination, 
irrespective how performed, generates further recombinant segments which encompass 
additional diversity than is present in recombinant segments resulting from previous rounds. 

The second round of recombination can be followed by a further round of 
screening/selection according to the principles discussed above for the first round. The 
stringency of screening/selection can be increased between rounds. Also, the nature of the 
f creen and the property being screened for can vary between rounds if improvement in more 
than one property is desired or if acquiring more than one new property is desired. 
Additional rounds of recombination and screening -an then be performed until the 
recombinant segments have sufficiently evolved to acquire the desired new or improved 

property or function. 

In a preferred embodiment, the invention provides a recursive method for 
making a nucleic acid encoding a specific monooxygenase activity. In this method, the 
parental nucleic acids are shuffled in a plurality of cells and the method optionally further 
includes one or more of: (a) recombining DNA from the plurality of cells that display 
monooxygenase activity with a library of DNA fragments, at least one of which undergoes 
recombination with a segment in a cellular DNA present in the cells to produce recombined 
cells, or recombining DNA between the plurality of cells that display monooxygenase 
activity to produce cells with modified monooxygenase activity; (b) recombining and 
screening the recombined or modified cells to produce further recombined cells that have 
evolved additionally modified monooxygenase activity; and, (c) repeating (a) or (b) until the 
further recombined cells have acquired a desired monooxygenase activity. 

In another preferred embodiment, the invention provides a method for making 
a nucleic acid encoding a specific monooxygenase activity. This method includes: (a) 
recombining at leas* -ne distinct or improved recombinant nucleic acid with a further 
monooxygenase activity nucleic acid, which fiirther nucleic acid is the same or different 
from one or more of the plurality of parental nucleic acids to produce a library of 
recombinant monooxygenase nucleic acids; (b) screening the library to identify at least one 
further distinct or improved recombinant monooxygenase nucleic acid that exhibits a further 
improvement or distinct property compared to the plurality of parental nucleic acids; and, 
optionally; (c) repeating (a) and (b) until the resulting further distinct or improved 
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recombinant nucleic acid shows an additionally distinct or improved monooxygenase 
property. 

The practice of this invention involves the construction of recombinant 
nucleic acids and the expression of genes in transfected host cells. Molecular cloning 
5 techniques to achieve these ends are known in the art. A wide variety of cloning and in vitro 
amplification methods suitable for the construction of recombinant nucleic acids such as 
|i expression vectors are well-known to persons of skill. General texts which describe 

molecular biological techniques useful herein, including mutagenesis, include Berger and 
Kimmel, GUIDE TO Molecular Cloning Techniques, Methods fn Enzymology, volume 
10 152, Academic Press, Inc., San Diego, CA (Berger); Sambrook et aL, MOLECULAR CLONING 
. A Laboratory Manual (2nd Ed,), Vol. 1-3, Cold Spring Harbor Laboratory, Cold 
Spring Harbor, New York, 1989 ("Sambrook") and Current Protocols in Molecular 
Biology, P.M. Ausubel e/a/., eds., Current Protocols, a joint venture between Greene 
Publishing Associates, Inc. and John Wiley & Sons, Inc., (supplemented through 1998) 
1 5 ("Ausubel")). Examples of techniques sufficient to direct persons of skill through in vitro 
amplification methods, including the polymerase chain reaction (PGR) the ligase chain 
reaction (LCR), Qp-replicase amplification and other RNA polymerase mediated techniques 
(e.g., NASBA) are found in Berger, Sambrook, and Ausubel, as well as Mullis et aL, U.S. 
Patent No. 4,683,202 (1 987); PGR PROTOCOLS A Guide to Methods and Applications 
20 (Innis et aL eds). Academic Press, Inc., San Diego, CA (1990) (Innis); Amheim & Levinson 
(October 1, 1990) C&EN 36A7\ The Journal Of NIH Research 3:iU94 (1991); (Kv/ohet 
aL, Proc. NatL Acad. ScL USA 86:1173 (1989); Guatelli et aL, Proc. NatL Acad, ScL USA 
87:1874 (1990); Lomell et aL, J, Clin, Chem 35:1826 (1989); Landegren et aL, Science 
241:1077-1080 (1988); Van Brunt, Biotechnology 8:291-294 (1990); Wu and Wallace, Gene 
25 4:560 (1989); Barringer et aL, Gene 89:1 17 (1990); and Sooknanan and Malek, 

Biotechnology 13:563-564 (1995). Improved methods of cloning in vitro amplified -ucleic 
acids are described in Wallace et aL, U.S. Pat. No. 5,426,039. Improved methods of 
amplifying large nucleic acids by PGR are summarized in Cheng et aL, Nature 369:684-685 
(1994) and the references cited therein, in which PGR amplicons of up to 40kb are 
30 generated. One of skill will appreciate that essentially any RNA can be converted into a 
double stranded DNA suitable for restriction digestion, PGR expansion and sequencing 
using reverse transcriptase and a polymerase. See, Ausbel, Sambrook and Berger, all supra. 
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In another aspect, the present invention provides a method of increasing 
monooxygenase activity in a cell. The method includes performing whole genome shuffling 
of a plurality of genomic nucleic acids in the cell and selecting for one or more 
monooxygenase activity. In this aspect of the invention, the genomic nucleic acids can be 
5 from substantially any source. In a preferred embodiment of this aspect of the invention, the 
genomic nucleic acids are from a species or strain different from the cell. In a ftirther 
I preferred embodiment, the cell is ofprokaryotic or eukaryotic origin. 

1 Substantially any monooxygenase property can be selected for using the 

methods of the invention. A preferred property is the activity of the polypeptide towards a 
'3 10 particular class of substrates. In preferred embodiment, the monooxygenase property is its 

ability to effect alkene epoxidation, alkane oxidation (e.g., hydroxylation, conversion to 
carboxylic acid, etc.), aromatic hydroxylation, N-dealkylation of alkylamines, S-dealkylation 
of reduced thio-organics, 0-Dealkylation of alkyl ethers, oxidation of aryloxy phenols, 
conversion of aldehydes to acids, dehydrogenation, decarbonylation, oxidative 
15 dehalogenation of haloaromatics and halohydrocarbons, Baeyer-Villiger monoxygenation, 
i modification of cyclosporins, hydroxylation of mevastatin, hydroxylation of fatty acids, 

hydroxylation/epoxidation of terpenes, conversion of cholesterol to pregnenolone, or 

oxygenation of sulfonylureas. 

In a third aspect, the invention provides a DNA shuffling mixture comprising: 
20 at least three homologous DNAs, each of which is derived from a nucleic acid encoding a 
polypeptide or polypeptide fragment which encodes monooxygenase activity. In a preferred 
embodiment of this aspect of the invention, the at least three homologous DNAs are present 

in cell culture or in vitro. 

Oligonucleotides for use as probes, e.g.. in in vitro amplification methods, for 
25 use as gene probes, or as shuffling targets (e.g., synthetic genes or gene segments) are 
, typically synthesized chemically according to the solid phase phosphoramidite triester 

method described by Beaucage and Caruthers, Tetrahedron Letts. 22(20): 1859-1 862, (1981) 
e.g.. using an automated synthesizer, as described in Needham-VanDevanter et al. Nucleic 
^cids Res., 12:6159-6168 (1984). Oligonucleotides can also be custom made and ordered 
30 from a variety of commercial sources known to persons of skill. 
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A. Formats for Sequence Recombination 

The methods of the invention entail performing recombination ("shuffling") 
and screening or selection to "evolve" individual genes, whole plasmids or viruses, 
5 multigene clusters, or even whole genomes (Stemmer, Bio/T echnology 13:549-553 (1995)). 
Reiterative cycles of recombination and screening/selection can be performed to further 
I evolve the nucleic acids of interest Such techniques do not require the extensive analysis 

and computation required by conventional methods for polypeptide engineering. Shuffling 
allows the recombination of large numbers of mutations in a minimum number of selection 
' 1 0 cycles, in contrast to natural pair-vrise recombination events {e, g. , as occur during sexual 
replication). Thus, the sequence recombination techniques described herein provide 
particular advantages in that they provide recombination between mutations in any or all of 
these, thereby providing a very fast way of exploring the manner in which different 
combinations of mutations can affect a desired result. In some instances, however, structural 
1 5 and/or functional information is available which, although not required for sequence 
recombination, provides opportunities for modification of the technique. 

Sequence recombination can be achieved in many different formats and 
permutations of formats. Exemplary formats and examples for sequence recombination, 
referred to, e.g., as "DNA shuffling." "fast forced evolution," or "molecular breeding," have 
20 been described in the following patents and patent applications: US Patent Application Serial 
No. 08/198,431, filed February 17, 1994, US Patent No. 5,605,793; PCT Application WO 
95/22625 (Serial No, PCT/US95/02126), filed February 17, 1995; US Serial No, 08/425,684, 
filed April 18, 1995; Serial No, 08/537,874, filed October 30, 1995, Serial No. 08/564,955, 
filed November 30, 1995, Serial No. 08/621,859, filed March 25, 1996, US Serial No. 
25 08/621,430, filed March 25, 1996; Serial No. PCT/US96/05480, filed April 18, 1996, Serial 
No. 08/650, 400, filed May 20, 1996, Serial No. PCT/US97/17300, filed September 26, 
1997, Serial No. PCT/US97/24239, filed December 17, 1997; Serial No. 98/354,922, filed 
% July 15, 1999, Serial No. PCT/US98/05956, filed March 25, 1998; PCT Application WO 

97/20078 (Serial No. PCT/US96/05480), filed April 1 8, 1996; PCT Application WO 
30 97/35966, filed March 20, 1997; US Serial No. 08/675,502, filed July 3, 1996; US Serial No. 
08/721, 824, filed September 27, 1996; PCT Application WO 98/13487, filed September 26, 
1997; "Evolution of Whole Cells and Organisms by Recursive Sequence Recombination" 
^ Attorney Docket No. 018097-020720US filed July 15, 1998 by del Cardayre et ai (USSN 
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09/161,188); Stenmitx, Science 270:1510 (1995); Stemmer eM/. , Gene 164:49-53 (1995); 
Stemmer, Bio/Technology 13:549-553 (1995); Stemmer, Proc. Natl Acad, Set. U.S.A. 
91:10747-10751 (1994); Stemmer, Nature 370:389-391 (1994); Crameri et ai. Nature 
Medicine 2(l):l-3 (1996); Crameri etai, Nature Biotechnology 14:315-319 (1996), and 
5 PCT Application WO 98/42832 (Serial No. PCT/US98/05956), filed March 25, 1998, each 
of which is incorporated by reference in its entirety for all purposes, 
f Gene shuffling and family shuffling provide two of the most powerful 

methods available for improving and "migrating" (gradually changing the type of reaction, 
substrate or activity of a selected enzyme) the functions of biocatalysts. In family shuffling, 
10 homologous sequences, e.g., from different species or chromosomal positions, are 
< recombined. In gene shuffling, a single sequence is mutated or otherwise altered and then 

recombined. These formats share some common principles. 

The breeding procedure starts with at least two substrates that generally show 
substantial sequence identity to each other {i.e., at least about 30%, 50%, 70%, 80% or 90% 
I 15 sequence identity), but differ from each other at certain positions. The difference can be any 

^ type of mutation, for example, substitutions, insertions and deletions. Often, different 

segments differ from each other in about 5-20 positions. For recombination to generate 
increased diversity relative to the starting materials, the starting materials must differ from 
each other in at least two nucleotide positions. That is, if there are only two substrates, there 
20 should be at least two divergent positions. If there are three substrates, for example, one 
substrate can differ from the second at a single position, and the second can differ from the 
third at a different single position. The starting DNA segments can be natural variants of 
each other, for example, allelic or species variants. The segments can also be from 
nonallelic genes shovsdng some degree of structural and usually functional relatedness (e.g., 
25 different genes within a superfamily, such as the arene monooxygenase super family). The 
^- starting DNA segments ran also be induced variants of each other. For example, one DNA 

segment can be produced by error-prone PGR replication of the other, or by substitution of a 
I mutagenic cassette. Induced mutants can also be prepared by propagating one (or both) of 

the segments in a mutagenic strain. In these situations, strictly speaking, the second DNA 
30 segment is not a single segment but a large family of related segments. The different 

segments forming the starting materials are often the same length or substantially the same 
length. However, this need not be the case; for example; one segment can be a subsequence 
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of another. The segments can be present as part of larger molecules, such as vectors, or can 

be in isolated form. 

The starting DNA segments are recombined by any of the sequence 

recombination fonnats provided herein to generate a diverse library of recombinant DNA 
i 5 segments. Such a library can vary widely in size from having fewer than 1 0 to more than 

10^ lO', lO'^ or more members. In some embodiments, the starting segments and the 
I recombinant libraries generated will include full-length coding sequences and any essential 

^ legulatory sequences, such as a promoter and polyadenylation sequence, required for 

expression. In other embodiments, the recombinant DNA segments in the library can be 
1 0 inserted into a common vector providing sequences necessary for expression before 
v! performing screening/selection. 

1. Use of Restriction Enzyme Sites to Recombine Mutations 

In some situations it is advantageous to use restriction enzyme sites in nucleic 
1 5 acids to direct the recombination of mutations in a nucleic acid sequence of interest. These 
y techniques are particularly preferred in the evolution of fragments that cannot readily be 

shuffled by existing methods due to the presence of repeated DNA or other problematic 
primary sequence motifs. These situations also include recombination formats in which it is 
preferred to retain certain sequences unmutated. The use of restriction enzyme sites is also 
20 preferred for shuffling large fragments (typically greater than 10 kb), such as gene clusters 
that cannot be readily shuffled and "PCR-amplified" because of their size. Although 
fragments up to 50 kb have been reported to be amplified by PGR (Bames, Proc. Natl. Acad. 
Sci U.S.A. 91:2216-2220 (1994)), it can be problematic for fragments over 10 kb, and thus 
alternative methods for shuffling in the range of 10 - 50 kb and beyond are preferred. 
25 Preferably, the restriction endonucleases used are of the Class II type (Sambrook, Ausubel 
^ and Berger, supra) ana of these, preferably those which generate nonpalindromic sticky end 

overhangs such as Alwn I, Sfi I or BstXl. These enzymes generate nonpalindromic ends that 
I allow for efficient ordered reassembly with DNA ligase. Typically, restriction enzyme (or 

endonuclease) sites are identified by conventional restriction enzyme mapping techniques 
30 (Sambrook, Ausubel, and Berger, supra.), by analysis of sequence information for that gene, 
or by introduction of desired restriction sites into a nucleic acid sequence by synthesis (i.e. 
by incorporation of silent mutations). 
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Expression von Herbizid-bindenden Polypeptiden in Pf lanzen zur 
Erzeugung von Herbizidtoleranz 

5 Beschreibiing 

Die vorliegende Erfindung betriff t ein Verfahren zur Herstellung 
von herbizidtoleranten Pflanzen durch Expression eines exogenen 
Herbizid-bindenden Polypeptides in Pflanzen oder Pf lanzenteilen. 
10 Die Erfindung betrifft weiterhin die Verwendung der entsprechen- 
den Nukleins^uren codierend fur ein Polypeptid, einen Antikdrper 
Oder Teilen eines Antikorpers mit Herbizid-bindenden Eigenschaf- 
ten in transgenen Pflanzen und die auf diese Weise transf ormierte 
Pflanze selbst. 

15 

Es ist bekannt, dafl mit Hilfe von gentechnischen Verfahren ge- 
zielt Fremdgene in das Genom einer Pflanze ubertragen werden kon- 
nen. Dieser Prozefi wird als Transformation und die resultierenden 
Pflanzen werden als transgene Pflanzen bezeichnet. Transgene 

20 Pflanzen werden derzeit in unterschiedlichen biotechnologischen 
Bereichen eingesetzt. Beispiele sind insektenresistente Pflanzen 
(Vaek et al. Plant Cell 5 (1987), 159-169), virusresistente 
Pflanzen (Powell et al. Science 232 (1986), 738-743) und ozonre- 
sistente Pflanzen (Van Camp et al. BioTech. 12 (1994), 165-168). 

25 Beispiele fur gentechnisch erzielte Qualitatssteigerungen sind: 
Erhdhung der Haltbarkeit von Fruchten (Oeller et al. Science 254 
(1991), 437-439), Erhdhung der Starkeproduktion in Kartof f elknol- 
len (Stark et al. Science 242 (1992), 419), Verdnderung der 
Starke- (Visser et al. Mol. Gen. Genet. 225 (1991), 289-296) und 

30 Lipidzusammensetzung (Voelker et al. Science 257 (1992), 72-74) 
und Produktion pf lanzenf remder Polymere (Poirer et al. Science 
256 (1992), 520-523). 

Ein wichtiges Ziel der pf lanzenmolekulargenetischen Arbeiten ist 
35 die Erzeugung von Herbizidtoleranz. Die Herbizidtoleranz ist ge- 
kennzeichnet durch tine in Art oder H6he gesteigerten Vertr^g- 
lichkeit der Pflanze oder von Pf lanzenteilen gegenuber dem appli- 
zlerten Herblzid. Diese kann auf verschiedene Arten bewerkstel- 
ligt werden. Die bekannten Methoden sind die Nutzung eines Meta- 
40 bolismusgens wie z.B. das pat-Gen in Zusammenhang mit der Glufo- 
sinat-Resistenz (WO 8705629) oder einem gegemlber dem Herbizid 
resistenten Zielenzym wie im Falle der Enolpyruvylshiki • 
inat-3-Phosphat-Synthase (WO 9204449), die resistent ist gegen 
Glyphosat , sowie die Verwendung eines Herbizids in Zell- und Ge- 
45 webekultur zur Selel^tion toleranter Pf lanzenzellen und daraus re- 
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sultierender resistenter PflanU wie bei Acetyl CoA Carboxylase 
Hemmstoffen beschrieben (OS 5162602. US 5290696). 

Antikfirper sind Proteine als Bestandteil des Immunsys terns. Allen 
F ^ raur-T^^he, clobuiare Strxiktur, der 

5 AntiJcdrpern gemeinsam ist ihre raur. tie. gxu . 

Aufbau aus leichter und schwerer Kette sowie ihre prinzipielle 
FShigkeit. Molekule Oder Teile einer Molekulstruktur mit hoher 
Spezifitat binden zu k6nnen (Alberts et al.. in: Molekularbxolo- 
gL der Zelle, 2. Auflage 1990, VCH Verlag. ISBN 3-527-27983-0, 

10 1198-1237). Aufgrund dieser Eigenschaf ten vmrden Antikdrper fur 
vielfaitige Aufgaben genutzt. Man unterscheidet dabei die Anwen- 
dung der Antikdrper im tierischen und menschlichen Organismen, 
die sie produzieren, die sogenannte .n-situ Anwendungen und die 
ex-situ Anwendungen. d.h. die Nutzung der Antikorper nach Isola- 

15 tion aus den produzierenden Zellen Oder Organismen (Whitelam und 
Cockbum. TIPS vol.1 , 8 (1996). 268-272). 

Die verwendung hybrider somatischer Zellinien (Hybridomas) als 
Quelle fur Antikdrper gegen ganz bestimmte Antigene geht auf Ar- 
20 beiten von Kohler und Milstein zuruck (Nature 256 (1975) 495-97). 
Nach dieseta Verfahren lassen sich sogenannte monoklonale Antikdr- 
per herstellen, die eine einheitliche Struktur besitzen und durch 
Zellfusion erzeugt werden. Dabei werden Milzzellen einer immuni- 
sierten Maus mit Zellen eines Mausmyeloms fusioniert. So entste- 
25 hen Hybridomazellen. die sich unbegrenzt vermehren. Gleichzeitig 
sezemieren die Zellen spezif ische Antikorper gegen das Antigen, 
mit dem die ^4aus imnumisiert worden war. Die Milzzellen Ixefern 
die FShigkeit zur Antikdrperproduktion. wdhrend die Myelomzellen 
die unbegrenzte Wachstumsf Shigkeit und die kontinuierliche Anti- 
30 kdrpersekretion beisteuem. Da jede Hybridomazelle sich als Klon 
von einer einzigen B-Zelle ableitet. besitzen alle erzeugten A^i- 
tikdrpermolekule dieselbe Struktur einschlieBlich der Antigenbm- 
dungsstelle. Diese Methode hat die Anwendung von Antikdrpem 
stark gefdrdert. da jetzt Antikdrper mit einer einzigen. bekann- 
35 ten Spezifitit und einer homogenen Struktur unbegrenzt zur Verfu- 
gung stehen. Monoklonale Antikdrper f inden breite Anwendung lu 
der Immundiagnostik und als Therapeutika. 

Seit einigen Jahren gibt es die sogenannte Phagen-Display-Methode 
40 zur Herstellung von Antikdrpern, bei der das lirmunsystem und die 
verschiedenen Immunisierungen im' Tier umgangen werden. Hierbei 
wird die Affinitat und Spezifitat des Antikdrpers in vitro maflge- 
schneidert (Winter et al.. Ann. Rev. Immunol. 12 (1994). 433-455; 
Hoogenboom TIBTech vol 15 (1997). 62 -70). Gensegmente. die die 
45 kodierende Sequenz der variablen Region von Antikdrpem enthait. 
d h die Antigen-Bindes telle, werden mit Genen fur das Hullpro- 
tein eines Bakteriophagen fusioniert. Dann infiziert man Bakte- 
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5 nun der Phage isolieren. der J Antigen bindet. Jeder so 

enthait und spezifisch -."^^"^^J^J ^tlgenbindendes Poly- 
isolierte Phage erzeugt ein monoklonales, 

peptid. das einem monoklonalen Antxkorper . 
far die Antigenbindungstelle die J^^^ / Konstruktion 
10 aind. kann man aus der Phagen-DNA isolieren un 
vollstandiger Antikdrpergene einsetzen. 

flpm Gebiet des Pf lanzenschutzes wurden Antikfirper ins- 
Auf dem Gebiet des ex-situ zum qualitativen und 

Trinkwasser j&n P aodenproben (WO 9423018) oder in 

JO ali Hllf^ittel .ur Rei^igun, von ,ebm>denen Moletoile. ein. 

Die production von i™.n,lobulinen In "J"-- 

„ ^ /iQflq\ 76 - 78 beschrieben. uas 

tuai-t- of al . Nature, 342 li3o»;# 

z:^ n;;.. von ------i^nr 'i^r rui" z 

25 kretorischen Antik&rpem (J. Ma und Mich Hem, j.^^ 
York Academy of Sciences, 72 - 81) . 

30 spe'i uken Antlkarpem Oder Teilen 

SllUteine in Pf l».enzellen .TavladoreXi " » - •'«^'4, " 
(1993). 469-472, VOS3 et al.. Mol. Breeding 1 (1995). 39 50). 

Hln a„alo,er «>aa« ist ^ -retTl-'^^: ZZ'^^' 

Supiele baxann.. ^e die «>.mrper-Ex.ression -'""^^^^^ 
Ptlanren Mr ein. orale In^nlsierun, nuwen (Ma et J^^''^ 
•,«« 11995) 716-7191 Mason und Amtien. Tlbtech Vol 13 (19961, 

iTeZs ein Ein-Ketten-Antik6rper (single chain 
45 das i"der.olekulare Pf lanzenhormon Abscisins^ure -P/^»^-^ 
eine verringerte Pf lanzenhormonverfugbarkeit aufgrund von Absci 
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sinsaurebindung in der Pflanze beobachtet (Artsaenko et al., The 
Plant Journal (1995) 8 (5). 745-750). 

Die chemische Dnkrautbekampfung in agrarwirtschaf tlich bedeuten- 
5 den Kulturen setzt den Einsatz von hochselektiven Herbiziden vor- 
aus in einigen Fillen ist es jedoch schwierig. Herbizide mit 
ausreichender SelektivitSt zu entwickeln, die keine Schadigung 
der Ertragspflanze verursachen. Die Einfuhrung von Herbizid-resi- 
stenten oder -toleranten Kulturpf lanzen kann zur Losung dieses 
10 Problems beitragen. 

Der Entwicklung von Herbizid-resistenten Kulturpf lanzen durch Ge- 
webekultur oder Samenmutagenese und naturliche Auswahl sind Gren- 
zen gesetzt. So k6nnen nur diejenigen Pflanzen uber Gewebekultur- 

15 techniken manipuliert werden. deren Regeneration zu ganzen Pflan- 
zen aus Zellkulturen gelingt. AuBerdetn konnen Kulturpf lanzen nach 
Mutagenese und Selektion unerwunschte Eigenschaf ten zeigen. die 
durch teilweise mehrmalige Riickkreuzungen wieder beseitigt werden 
massen. Auch wire die Einbringung einer Resistenz durch Kreuzung 

20 auf Pflanzen der selben Art beschrSnkt. 

Aus diesen Griinden ist der gentechnische Ansatz. ein fiir die 
Resistenz codierendes Gen zu isolieren und in Kulturpf lanzen ge- 
zielt zu ^bertragen, dem klassischen Zflchtungsverfahren uberle- 
25 gen. 

Die molekularbiologische Entwicklung von Herbizid-toleranten bzw. 
Herbizid-resistenten Kulturpflanzen setzt bisher voraus, daB der 
Wirkinechanismus des Herbizides in der Pflanze bekannt ist und dc.3 
30 Gene, die Resistenz gegen das Herbizid vermitteln gefunden werden 
kftnnen. Viele gegenwSrtig kommerziell genutzten Herbizide wirken. 
indem sie ein Enzytn einer essentiellen Aminosaure-. Lipid- oder 
Pigtnentbiosynthese blockieren. Durch Vertaderung der Gene dxeser 
Enzyme dergestalt, daB das Herbizid nicht mehr gebunden werden 
35 kann und durch Einbringung diesar veranderten Gene in Kultur- 
pflanzen laflt sich Herbizid-Toleranz erzeugen. Alternativ k6nnen 
zum Beispiel in der Natur analoge Enzyme beispielsweise in Mikro- 
organismen gefunden werden. die eine nat<irliche Resistenz gegen- 
aber dem Herbizid zeigen. Dieses Resistenz vermittelnde Gen wird 
40 aus einem derartigen Mikroorganismus isoliert. in geeignete Vek- 
toren umkloniert und anschliefiend nach erf olgreicher Transforma- 
tion in Herbizid-sensitiven Kulturpflanzen zur Expression ge- 
bracht (WO 96/38567) . 



45 
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Aufaabe der vorliegenden Erfindung war die Entwicklung eines neu- 
^tiS^ ange^ein einsetzbaren, gentechnologischen Verfahrens zur 
Erzeugung von Herbizid-toleranten transgenen Pflanzen. 

5 Diese Aufgabe wurde flberraschenderweise gelost durch ein Verfah- 
rl^ der Egression eines exogenen Polypeptides. Antilc6rpers Oder 
^;^le^ ei^s Antikarpers mit Herbizid-bindenden Eigenschaf ten xn 



den Pflanzen. 



10 Ein erster Gegenstand der vorliegenden Erfindung betrifft die 
Herstellung eines Herbizid-bindenden Antilc6rpers und d.e 
Kloniening des zugehdrigen Gens bzw. Genfragmentes. 

Es wird zunachst ein geeigneter Antikorper erzeugt. der das 
15 Herbizid bindet. Dies kann u.a. durch In«nunisierung ei-s W.rbel- 
t'ers meist Maus. Ratte. Hund. Pferd, Esel oder Ziege mit einen, 
:;ti:;n erfolgen. Das Antigen ist dabei eine herbizid wxrksame 
Verbindung. die toer eine funktionelle Gruppe an einen hfiher- 
^Siularl; Trager wie Rinderserumalbumin (BSA) . Hiihnereiweifl 

20 CS^in) keyhole li«npet he.ocyanin (K.H) oder andere Tr.g« 
gekoppelt oder assoziiert vorliegt. Die Immunantwort wird nach 
I^SmaUger Antigenapplikation mit gingigen Methoden nachvollzo- 
Zn und so ein geeignetes Antiserum isoliert. Dieser Ansatz lie- 
Hrt'Llchst ein polyklonales Serum, das Antik6rper m t unter^ 

25 schiedlichen Spezifitaten enthait. F<lr den geztelten in-situ Ge 

' Traui ist es notwendig. die f.r einen einzelnen -P^^^^^^^^^^^ 
monoklonalen Antik6rper codierende Gensequenz zu J" 
diesem Zweck stehen verschiedene Wege offen ^" 
nutzt die Fusion von Antikdrper-produzierenden Zellen mxt Krebs 

30 zeSL zu einer standig Antik6rper produzierenden Hybrxdomazell - 
kuitur die d^ch vereinzelung der enthaltenen Klone letztlich zu 
eiier io^Jenen, einen def inierten monoklonalen Antik6rper produ- 
zierenden Zellinie fuhrt, 

35 Aus einer derartige. monoklonalen Zellinie wird die ^^^^^^^J 
^nMicftmer bzw Telle des Antikdrpers, den sog. Ein-Ketten Ant. 
Srier^ ingle'crain antibody - scFv) isoliert. Diese cDKA-Se- 
^^^en k6n^en dann in Express ionskassetten kloniert und 
?^ktionellen Expression in prokaryotischen und eukaryotischen 

40 organismen, einschliefllich Pflanzen genutzt werden. 

ES ist auch m6glich uber Phagen-Display-Banken ^tik6rper zu se- 
lektieren. die Herbizidmolekule binden und katalytisch in ein 
ProdiJt :;t nicht herbiziden Eigenschaf ten umsetzen Methoden zur 
45 Herstellung katalytischer AntikSrper sind in Janda et al.. 
S en^e 2.1 (1937) 945-94B, Chemical -^-^^^^^/^^^^ 
combinatorial Antibody libraries; Catalytic Antibodies. 1991. 
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Ciba Foundation Sytnposium 159, Wiley- Interscience Publication 
beschrieben. Durch Klonierung des Gens dieses katalytischen Anti- 
korpers und dessen Expression in einer Pflanze kann im Prinzip 
ebenfalls eine Herbizid-resistente Pflanze erzeugt werden. 

^ Gegenstand der Erfindung sind insbesondere Express ionskassetten. 
deren kodierende Sequenz fur ein Herbizid-bindendes Polypeptid 
Oder dessen funktionelles Aquivalent codiert, sowie deren 
verwendung zur Herstellung einer Herbizid-toleranten Pflanze. Die 

10 Nukleinsauresequenz kann dabei z.B. eine DMA- oder eine cDNA-Se- 
quenz sein. Zur Insertion in eine erf indungsgemaBe Expressxons- 
kassette geeignete kodierende Sequenzen sind beispielsweise sol- 
che die eine DNA-Sequenz aus einer Hybridomazelle enthalten, die 
fiir'ein Polypeptid mit Herbizid-bindenden Eigenschaf ten codiert 

15 und die dem Wirt Resistenz gegen Inhibitoren pflanzlicher Enzyme 
verleihen. 

Die erfindungsgemafien Expressionskassetten beinhalten aufierdem 
regulative Nukleinsauresequenzen, welche die Expression der co- 
20 dierenden Sequenz in der Wirtszelle steuern. GemaB einer bevor- 
zugten Aus fdhrungs form umfaflt eine erf indungsgemaBe Expressions- 
kassette stromaufwarts. d.h. am 5'-Ende der codierenden Sequenz 
einen Proraotor und stromabwarts. d.h. am 3'-Ende ein Polyadeny- 
lierungssignal und gegebenenf alls weitere regulatorische Ele- 
25 mente. welche mit der dazwischenliegenden codierenden Sequenz fur 
das Polypeptid mit Herbizid-bindenden Eigenschaf ten und/oder 
Transitpeptid operativ verknupft sind. Unter einer operativen 
Verknxipfung versteht man die sequenzielle Anordnung von Promotor, 
codierender Sequenz. Terminator und ggf . weiterer regulativer 
30 Elemente derart. daB jedes der regulativen Elemente seine Funk- 
tion bei der Expression der codierenden Sequenz bestimmungsgemaa 
erf alien kann. Die zur operativen Verkn^pfung bevorzugten aber 
nicht darauf beschrSnkten Sequenzen sind Targeting-Sequenzen zur 
Gewahrleistung der subzellularen Lokalisation im Apoplasten, in 
35 der Plasmamembran. in der Vakuole, In Plastiden. ins Mitochon- 
drium, im Endoplasmatischen Retikulum (ER) , im Zellkem. in 01- 
korperchen oder anderen Kompartimenten und TranslationsverstSrker 
wie die 5'-Fvihrungssequenz aus dem Tabak Mosaic Virus (Gallie et 
al.. Nucl. Acids Res. 15 (1987) 8693-8711). 

40 

Als Promotoren der erfindungsgemafien Expressionskassette ist 
grundsatzlich jeder Promotor geeignet, der die Expression von 
Fremdgenen steuern kann. Vorzugsweise verwendet man insbesondere 
einen pflanzlichen Promotor oder einen Promotor, der einem Pflan- 
45 zenvirus entst^mmt. Insbesondere bevorzugt ist der CaMV 35S-Pro- 
motor aus dem Blumenkohl-Mosaik-Virus (Franck et al.. Cell 
21(1980) 285-294). Dieser Promotor enthalt unterschiedliche Er- 
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kennungssequenzen fflr transkriptionale Effektoren, dxe in ihrer 
Geaamtheit^u einer pern>anenten und >^onstitutive„ Expre ion des 
eingefuhrten Gens f^hren (Benfey et al., EMBO J. 8 (1989) 
2195-2202) . 

^ Die erfindungsgemaSe Expressionskassette kann auch einen che- 
misch induzierbaren Promoter enthalten, durch den dxe Expression 
des exogenen Polypeptids in der Pf lanze zu einem bestxmmten Zeit- 
punkt gesteuert warden kann. Derartige 

10 PRPl-Promotor (Ward et al. . Plant.Mol. Biol .22 (1993) , 361-366) ein 
lurch Salizylsaure induzierbarer Promotor (WO 95/191944 ), exn 
durch Benzelesufonamid-induzierbarer (EP 388186) , exn durch Ab- 
scisinsaure-induzierbarer (EP335528) ..w. ein durch Ethanol- oder 
Cyclohexanon-induzierbarer (W09321334) Promoter sxnd der 

15 Literatur beschrieben und k6nnen u.a. verwendet werden. 

weiterhin sind insbesondere solche Promotoren bevorzugt. die Ex- 
pression in Geweben oder Pf lanzenteilen sicherstellen. in denen 
sich die Herbizidwirkung entfaltet. Insbesondere zu nennen sind 
20 Promotoren. die eine Blatt-spezif ische Expression gewahrleisten. 
zu nennen sind der Promoter der cytosolischen FBPase aus Kartof- 
£el Oder der ST-LSI Promoter aus Kartoffel (Stockhaus et al.. 
EMBO J. 8 (1989) 2445-245) . 

25 Mit Hilfe eines samenspezifischen Promoters kennten Einketten-An- 
t kaSer stabil bis zu 0.67% des gesamten 16slichen S-enpr°teins 
in d^ Samen transgener Tabakpf lanzan exprimiert werden (Fiedler 
und Conrad. Bio/Technology 10(1995). ' "^inS' 

pression in ausgesiten oder keimenden Samen '^9^^'^^^^'^/'"^^^^^. 

30 der vorliegenden Erf indung erwunscht sein kann. sxnd entsprechend 
K:Lungs- Ld samen-spezifische Promotoren 

gemftfi beverzugte regulative Elemente. Die erf indungsgemSfie Ex 
pressionskassette kann daher beispielsweise einen samenspezifi- 
schen Promoter (beverzugt den OSP- oder l.EB4-Promotor das 
35 LEB4-Signalpeptid. das zu exprimierende Gen und e n ^^-^^ti 
onssignal enthalten. Der Aufbau der Kassette i^'^/"/^'^^^^"^^^^ 
1 am Lispiel eines Einketten-Antikorpers (scFv-Gen) schematisch 

beispielhaft dargestellt. 

40 Die Herst^llung einer erf indungsgemiBen Expressienslcassette er- 
folgt durch Fusion eines geeigneten Promoters mit einer geeigne- 
ten Polypeptid-DNA und vorzugsweise einer zwischen ^^^^^^tor und 
Polypeptid-DNA insertierten f^r ein chloroplasten-spezifisches 
TraTsitpeptid kodierenden DMA sowie einem Polyadenylierungssignal 

45 nach g^i^en Rekombinations- und ^-^^-^'^^^f 'S^rook "o- 
beispielsweise in T. Maniatis. E.F. Fritsch und J. Sambrook Mo 
lecular Cloning: A Laboratory manual. Cold Spring Harbor Labera 
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vTv fiQRQl sowie in T.J. Silhavy, M.L. 
tory, cold spring ^ J''''^,3';rth Gene Fusions, Cold 

Beman und L.W. Enquist. ^^-"^^f,^/^!,^^ (1984) und in 

spring Harbor Laboratory. ^"^l^'f l^J;,^ Molecular Biology. 

Ausubel. F.M. et al ^-^^'^J^^j!^: ^science (1987) beschrie- 
5 Greene Publishing Assoc. and Wiley mue 

ben Bind. 

Apoplasten, Plastlden, die ° „der durch eln 

et al.. Plant J. 8 (1995) 745-750). 

«,a.«and da. --"^^"dSrr "^^^^^^^^ 
" SSr°"«Sa Te^d ains ai„ ..ansitpaptid das 

n-rr=?Mnd:s:."z^^^^^ 

.5 Cloroplastan v=» """'"-""f^^f ^^^^if/f/Aaa i-sUpaptid 

abgelaltet von plastldSrer TransMtoi iransltpeptid 
nallan i^lv.l«.t diases 'f,, '/eLadoxln HADP 

der kleinen Untereinheit der Ruiiisco oae 

30 Oxidoredulctase) . 

Die zur Herstellung erf indungsgemaBer ^^""^"^^^^//^"^^.tfr" 
Die zur n«i.o -rDNA wird vorzugsweise mit Hlite 

ris;nirr;.raar^^^ 

" ..a .nsa„i.«. -""""rrch^-nre^anro'di" '^^t 
dandes ^"J^'f ".^^/.^ ^'^^^^ s"raas=han und „at.rlichan 

sein odar ^'"^ »i=^^'^„\ii,^ina„ .ardan synthatischa 
DBA-Bcstandtailen anthalten. in J* pflanian bevor- 

,S «„claotid-Sa^an.a„ t\]T^l"lZl:.X Codons .Onnan aua Co- 
rs Itr::; :r3.rP=uU,.aU baa.™. »»dan, d.a 
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den meisten interessanten Pf lanzenspezies exprimiert ^^^^^if^^ 
der preparation einer Expressionskassette Icdnnen verschxedene 
DNl-Fra^ente manipullert warden, um eine Nucleotid-Seqaenz zu 

die zweCc^esigerweise in der ^°~^f^^^^^^^^/^ , 
5 und die mit einem korrekten Leseraster ausgestattet ist. Fur die 
^^rbiidJ^g der DNA-Frag.ente cniteinander k6nnen an die Fragmente 
Adaptoren Oder Linker angesetzt werden. 

ZweckmaBigerweise sollten die erf indungsgemSSen ^ 
10 die Terminator-Regionen in Transkriptionsrxchtung mit einem Lxn- 
ker od^ Polylinker, der eine oder .ehrere 

far die insertion dieser Sequenz enthtlt. versehen werden. In der 
rLi hat der Linker 1 bis 10. meistens 1 bis 8, vorzugsweise 2 
bis 6 Restriktionsstellen. Im allgemeinen hat der Lxnker xnner- 

15 halb der regulatorischen Bereiche eine Gr6Be von weniger als 
100 bp. haufig weniger als 60 bp. mindestens jedoch 5 bp. Der 
erfindungsgemaae Promotor kann sowohl nativ bzw. homolog als auch 
"^dart!g bzw. heterolog zur Wirtspflanze sein. Die erfindungs- 
g^Be Ex^ressionskassette beinhaltet in der 5' -3' -^"nskrxpti- 

20 onsrichtung den erf indungsgemSBen Promotor, eine beliebxge Se 
quenz und eine Region fur die transkriptionale Termination. Ver 
schiedene Terminationsbereiche sind gegeneinander beliebig aus- 
tauschbar. 

25 Femer kdnnen Manipulationen. die passende Restriktionsschnitt- 
st^Ln bereitstellen oder die uberf l.ssige DNA oder ^estrikti- 
onsschnittstellen entfernen. eingesetzt werden. Wo ^-er^onen. 
Deletionen oder Substitutionen wie z.B. Transitionen ^'J-- 
versionen in Frage kotnmen. k6nnen in vitro-Mutagenese. -prxmerre 
30 Tair" Restriktion oder Ligation verwendet ««<^-- f^/^^^^^^ 
Lxipulationen. wie z.B. Restriktion. "-^-i^^-f 
len von Oberhtagen fur -bluntends-. k6nnen komplementire Enden 
der Fragmente far die Ligation zur Verfugung gestellt werden. 

35 von besonderer Bedeutung fur den erf indungsgemSflen Erfolg ist das 
^angen des spezif ischen ^R-Retentionssignals SEKDEL (Schuoten. 
A et al. Plant Mol. Biol. 30 (1996). 781 - 792). die durch- 
sch^ittiiche Expressionshahe wird da.it verdreifacht verv^^^^^ 
facht. Es k6nnen auch andere Retentionssignale. f '^^^fj. . 

40 weise bei.im ER lokalisierten pflanzlichen und tierisch^ Protei 
nen vorkommen. fur den Aufbau der Kassette eingesetzt werden. 

Bevorzugte Polyadenylierungssignale sind pflanzliche Polyadeny^ 
UerungLignale, vorzugsweise seiche, die im wesentlichen T-DNA- 
45 Polyadenylierungssignale aus Agrobacterium '^^^^^^J^"'/^^*^ . 
besondere des Gens 3 der T-DNA (Octopin Synthase) des Ti-Plasmids 
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V. in^-ian et al., iMBO J. 3 (1984) 835 ff) Oder 
pTiACHS entsprechen (Gielen et aj..» 

funktionelle Aquivalente. 
verwendet. 

vorzugsweise wird die fusionierte Expressionskassette. die f;ir 
I!n Polypeptid mit Herbizid-bindenden Eigenschaf ten codxert in 
15 einen victor, beispielsweise pBinl9, kloniert. .^^ ' 
Agrobacteriu. tumefaciens zu transfor.ieren ^^^-^^^^.t 
Vektor transformierte AgrobaJcterien kdnnen dann in bekannter 
Wefse zur Transformation von Pflanzen. insbesondere von Kultur- 
pflanzen, wie z.B. von Tabakpf lanzen, verwendet ^^^^ 
20 beispielsweise verwundete Blotter oder Blattstucke -^-^-J^'" 
bakterienlosung gebadet und anschlieBend f ^^J'^^. 
kultiviert werden. Die Transformation von Pflanzen durch Agrobak 
Srurist unter anderem bekannt aus P.P. White. Vectors for Gene 
Tralsferln Higher Plants; in —genie Plant. Vol^ 1, Hnginee- 
25 ring and Utilization, herausgegeben von S.D. Kung und R. Wu, 
25 ring ana uc Gelvin, Molecular Ge- 

Academic Press, x^y^t o< «n«4->,- 
netics of T-DNA Transfer from Agrobacteriun, to Plants, gleich 
fans in Transgenic Plants. S. 49-78. Aus den transf ormierten 
fails in iransg m-tter bzw. BlattstGcke kdnnen in bekann- 

Zellen der verwundeten Blatter ozw. oxau ^< in die 

30 ter weise transgene Pflanzen regeneriert 

crf indungsgemaae Expressionskassette integriertes Gen fir die Ex 
pression eines Polypeptides .it Herbizid-bindenden Eigenschaf ten 



en thai ten. 



35 zur Transformation einer Wirtspf lanze mit einer fur ein Herbizid- 
" Mnde^res P^vpeptid codierenden Dr.. wird ^ine er^in^^^^^^^^^ 

ExDressionskassette als Insertion in emen rekombinanten Vektor 
e^ge^aX dessen Vektor-DNA zus^tzliche funktionelle Regulati- 
onsstgnall. beispielsweise Se<,uenzen fflr Replikation oder Inte- 
40 gration entMlt. Geeignete Vektoren sind unter anderem in 
-Methods in Plant Molecular Biology and Biotechnology 
(CRC Press). Kap. G/7. S.71-119 (1993) beschrieben. 

unter Verwendung der oben zitierten Rekombinations- und 
45 Kl^^ierungstech^iken k5nnen die "findungsgem.5en Expressioj^^^^ 
setten in geeignete Vektoren kloniert werden. die ihre Vermeh 
r^ng beispielsweise in E. coli. ermoglichen. Geeignete Klonie- 
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. J ^ rNM77 5 nUC-Serien, M13mp-Serien und pA- 
rungsvektoren sind u.a. pBR332 P"^ Serien. ^^^^ 

CYC184. Besonders geeignet sind binare veKCoren, 

coli als auch in Agrobalcterien replizieren '^o--^ w e z.B. 

pBinl9 (Sevan et al. (1980) Nucl. Acids Res. 12. 8711). 

' Ein weiterer Gegenstand der Erfindung betrifft die ^erwendung 
einer erf indungsgemafien Expressionslcassette zur Transformation 

Pf!^ en, Pf lanzenzellen. -geweben oder Pf lanzenteilen. 
Tzugsweise ist Ziel der Verwendung die Vennittlung von 

10 Resistenz gegen Inhibitoren pflanzlicher Enzyme. 

Dabei kann je nach Wahl des Promotors die Expression spezifisch 
TL Z.Ln, in den Samen oder anderen Teilen der P"-- «- 
f^lgen Solche transgenen Pf lanzen, deren Vermehrungsgut sowxe 
15 dereTpflanzenzellen. -gewebe oder -teile sind ein we.terer Ge- 
genstand der vorliegenden Erfindung. 

Die Obertragung von Fremdgenen in das Genom einer Pflanze wird 
Die Ubertragung .„>,„«^ es werden dabei die beschriebenen 

als Transformation bezeichnet. Es weraen a«iB» 

derbio^^tische Ansatz mit der GenJcanone. die Elelctroporation. 
25 fie I^ation trocJcener H^ryonen in DNA-haltiger f ^ 

MiLoSelction und der Agrobacterium-vermittelte Gentransfer. Die 
genLmten verfahren sind beispielsweise J^^^'* ' 

Techniques for Gene Transfer, in: Transgenic Plants, Vol 1. En 
gtne^r?^g and Utilization, herausgegeben von S.D. Kung und R Wu. 

30 Icad^ic Press (1993) 128-143 sowie in PotryJcus Annu. Rev. Plant 
Phys'l.PlantMolec.Biol. 42 (1991) 205-225) beschrieben. 
vorzugsweise wird das zu exprimierende Konstrukt --J^-^^^^^f 
Jcloniert. der gee.gnet ist. Agrobacterium tumefaciens "ans 
ior^iere;, beispielsweise pBinl9 (Bevan et al.. Hucl. Acids Res. 

35 12 (1984) 8711). 

Mit einer erf indungsgemaBen Express ionskassette transformierte 
Mlt einer eiiin^iuiiy >, vo«r,>-or waise zur Transformation 

Agrobakterien kdnnen dann in bekannter Weise ^^'^^ 
von Pflanzen. insbesondere von Kulturpf lanzen. wie Getreide 

40 ;::is. Soia. Reis. Baumwolle. ^-ker^- ^^^f^^ srarJ;rd;n 
Plachs, Kartoffel. Tabak. Tomate, Raps. Alfalfa, ^alat und aen 
lirsch edenen Baum-. NuE- und Weinspezies. verwendet werd». z^B. 
indem verwundete Blatter oder Blattst.cke in ^^""//^^^^^^^ 
losung gebadet und anschlieSend in geeigneten Medien kultiviert 



45 werden. 
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Funktionell iquivalente Sequenzen. die fur ein Herbizid-bindendes 

che^rotz abweichender Nucleotidseciuenz noch die gewunschten 
^nktionen besitzen. Funktionelle Aguivalente umfassen somit na- 
5 SrUch vorkor^nende Varianten der hierin beschriebenen Sequenzen 
sowie kvinstliche, z.B. durch chemische Synthese erhaltene an dxe 
Codon-usage einer Pflanze angepaSte. kunstliche Nucleotxd-Sequen- 
zen. 

10 unter einem funktionellen Acjuivalent versteht man i-besondere 
auch naturliche Oder kunstliche Mutationen einer ursprunglich 
tsolierten das Herbizid-bindende Polypeptid codierenden Secjuenz. 
ielche weiterhin die gew^nschte Funktion zeigen. Mutationen u«n- 
fassen Substitutionen. Additionen. Deletionen, Vertauschungen 

15 oder Lertionen eines oder mehrerer Nukleotidreste Somit werden 
beispielsweise auch seiche Nucleotidsequenzen durch dxe vorlie- 
TeJs Erfindung mit umfaBt. welche man durch Mod f .katxon dxeser 
Lcleotidsequenz erhSlt. Ziel einer solchen Modif ikation kann 
Tb die weitere Eingrenzung der darin enthaltenen codierenden 

20 Sequenz oder z.B. auch die Einf^gung weiterer Restriktionsenzym- 
Schnittstellen sein. 

Pvmktionelle Aquivalente sind auch solche varianten deren Funk- 
tion. verglichen mit dem Ausgangsgen bzw. Genfragment, abge 
25 schwacht Oder verstirkt ist. 

AuBerdem sind artifizielle DNA-Sequenzen geeignet, solange sie. 
wie oben beschrieben, die gewunschte Resistenz gegenuber 
Herbiziden induzieren. Solche artif iziellen ^^A-Sequenzen k6nnen 
30 beispielsweise durch Ruckubersetzung mittels Molecular Modelling 
konstruierter Proteine. die Herbizid-bindende Aktivitat aufweisen 
Oder durch in vitro-Selektion ermittelt werden. Besonders geei- 
Te^ sind kodierende DNA-Sequenzen. die durch Ruckubersetzung 
!Iner Polypeptidsequenz gemSB der fiir die Wirtspflanze spezif - 
35 schen Cod^-Sutzung erhalten wurden. Die spezifische Codon-Nut- 
zung kann ein mit pf lanzengenetischen Methoden vertrauter 
FacLann durch Computerauswertungen anderer. bekannter Gene der 
zu transformierenden Pflanze leicht ermitteln. 

40 Als weitere erf indungsgemiBe geeignete iquivalente NukleinsSure- 
^lenzen sind zu nennen Sequenzen. welche f<ir 

kodieren. wobei Bestandteil des Fusionsprotems ein nicht-pflanz 
uJ;es1^;rbizid-bindendes Polypeptid oder ein 

lenter Teil davon ist. Der zweite Teil des Fusionsproteins kann 
« z B eil weiteres Polypeptid mit enzymatischer Akt vitat sein 
Oder eine antigene Polypeptidsequenz mit deren "J^^^^^^^^^^f 
auf scFvs Expression m6glich ist (z.B. myc-tag oder his-tag) . 
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Bevorzugt handelt es sich dabei jedoch un, eine regulative Pro- 
teinsequenz. wie z.B. ein Signal- oder Transitpeptid. das das 
^ZZT, mit Herbizid-bindenden Eigenschaf ten an den gewunsch- 
ten Wirkort leitet. 

' Gegenstand der Erfindung sind aber auch die «f f ^"^f/^";^^/'^- 
ze!gten Expressionsprodukte. sowie Pusionsproteine 
Tr^sttpeptid und einam Polypeptid mit Herbizid-bxndenden Eigen- 

schaf ten, 

" Resistenz bzw. Toleranz bedeutet im Rahmen der vorliegenden Er- 
f^l^g die kunstlich erworbene ^'iderstar^.f^higkeit gegen dxe 
Wirlcung pflanzlicher Enzym-Inhibitoren. Sie umfafit dxe P-^^^^" 
und insbesondere. die vollstandige Onempf indlxchkext gegenOber 

15 ^e;en iribitoren f.r die Dauer .indestans einer Pf lanzengenera- 
tion. 

Der primare Wirkort von Herbiziden ist im allgemeinen das Blatt- 
geweSe so daB eine blattspezif ische Expression des exogenen Her- 
20 b zfd-Mndenden Polypeptides ausreichenden ^^'/J 
ist jedoch naheliegend. daB die Wirkung eines ""^"^f^^^^"^/,'^' 
das Blattgewebe beschrSnkt sein mu6. sondern auch in alien ubrx 
gen Teilen der Pflanze gewebespezif isch erfolgen kann. 

25 Daruberhinaus ist eine konstitutive Expression des exogen^ Her- 
bizid-bindenden polypeptides von Vorteil. ^" 
auch eine induzierbare Expression wunschenswert erschemen. 

Die Wirksamkeit des transgen exprimierten ^^^^^^^^^f^^^f ^^^^f 
30 zid-bindenden Eigenschaf ten kann beispielswexse in vitro durch 

s;roa.eriste.ver.ehrung auf ^^^^'^'^-^^^^''T .""^T.eTs J 
stufte Konzentrationsreihen oder uber Samenkeimungstests emit 
telt werden. Zudetn kann eine in Art und H6he verinderte H^bizid 
vertrlglichkeit einer Testpflanze in Gew^chshausversuchen gete- 
35 stet werden, 

Gegenstand der Erfindung sind auBerdem "^^f 3^^^"' 
fomiert mit einer erfindungsgenABen ^^Pressionskassette sowie 
transgene Zellen, Gewebe, Teile und Vermehrungsgut ^"^^ 
Besonders bevorzugt sind dabei_ «ansgene m u^^^^^ 
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Besonders Devorzugt a^-"" v.^-^* - rsinnia 

z.B. Cetreide. Mais. Soja. Reis. BaumwoUe, 
Sonnenblucne. Flachs. Kartoffel. Tabak, Tomate, Raps Alfalfa. Sa 
lat und die verschiedenen Baum-. NuB- und Wemspecies. 

45 Die transgenen Pflanzen. Pflanzenzellen -gewebe "^^^^^^^f 
nen .it eine. Wirkstoff . der Pna-^i^c^^^^^^^^ 
behandelt werden, wodurch die nicht ertoigreicn 
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Pflanzen. -zellen. -gewebe Oder Pf lanzenteile absterben. 
Beispiele fur geeignete Wirkstof fe sind insbesondere 
5-{2-Chlor-4-(trifluormethyl)phenoxy)-2-nitrobenzoesaure 

(Acifluorfen) und T-Chlor-S-methylchinolin-S-carbonsaure 
Ouinmerac), sowie Metabolite und funktionelle Derivate dieser 
veri'l^gl^. Die in die erf indungsge.a6en Express ionslcasset ten 
InsertierL, fur ein Polypeptid mit Herbizid-bindenden Eigen- 
Ichaften codierende DNA. kann somit auch als Selektionsmarker 
verwendet werden. 

insbesondere bei Kulturpf lanzen bietet die ^^f/"^""' 
den vorteil dafi nach Induktion einer selektiven Resistenz der 
Kulturpflanze gegenuber pflanzlichen Enzym-Inhibitoren dxese 
Siibitoren als spezifische Herbizide gegen nicht "sis ente 
15 manzen eingesetzt werden k6nnen. Als nicht-limxtierende Bei- 
spiele fur derartige Inhibitoren k6nnen genannt -r den die fol- 
genden herbiziden Verbindungen aus den Gruppen bl - b41 : 

bl 1,3,4-Thiadiazolen: 
20 buthidazole, cyprazole 

" ^Udochlor (CDAA). benzoylprop-ethyl, bromobutide. chlort- 
hiamid. dimepiperate. dimethenamid. diphenamid etobenzanid 
25 (benzcilomet) . f lamprop-methyl. fosamin, isoxaben. monalide. 

naptalame, pronamid (propyzamid) , propanil 

30 phosate, sulfosate 

b4 Aminotriazolen: 
amitrol 

35 b5 Anilide: 

anilofos, mefenacet 

McS; .necoprop. n»coptop-P. napropamiae, napropanlUde, trl 

clopyr 



40 



b7 Benzoesauren: 
45 chloratnben, dicamba 
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b8 Benzo thiadiaz inonen : 
bentazon 

b9 Bleacher 



5 clomazone (dimethazone) , dif luf enican, fluorochloridone. flu- 
poxL, fluridone, pyrazolate. sulcotrione (chlormesulone) 

blO Carbamaten: ^ , _ 

asulam, barban, butylata, carbetamid, chlorbufam, chlorpro- 
10 pham, cycloate, desmedipham. diallate, EPTC, esprocarb. moli- 
nate, orbencarb, pebulate. phenisopham. phenmedipham. pro- 

prosulfocarb, pyributicarb, sulfallate (CDEC) terbu- 
carb. thiobencarb (benthiocarb) . tiocarbazil. triallate, ver- 
nolate 

15 

bll Chinolinsiuren: 

quinclorac, quinmerac 

bl2 Chloracetaniliden: ^^«v>,ah^ri 
20 acetochlor. alachlor, butachlor, butenachlor. <^^-^^^J^^ 

ethyl, dimethachlor. metazachlor. metolachlor, pretilachlor. 
propachlor. prynachlor. terbuchlor. thenylchlor, xylachlor 

.s'" rio°~:^oxydi.. cletbodi. cloproxydi. cyc«, 

^eth^dim. tralkoxydim. 2-(l-[2-{4-Chlorphenoxy)propyloxyx- 
minolbutyU ' 3-hydroxy-5- (2H-tetrahydrothiopyran-3-yl) - 
2 - cyclohexen- 1 -on 

30 bl4 Dichlorpropionsauren: 
dalapon 

bl 5 Dihydr obenzof urane : 
ethofumesate 

35 

bl 6 Dihydrof ur an- 3 -one : 
f lurtamone 

.0 re":un, dinitra^in. e.ha.nu.aUn. nucMoraUn 
isopropalin, nitralin. oryzalin, pendimethalxn, prodiamine. 
profluralin, trifluralin 

bl8 Dinitrophenole: 
45 bromofenoxim. dinoseb, dinoaeb-acetat. dinoterb. DNOC 
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bl9 Diphenyl ether: 

acifluorf en-sodium, aclonifen, bifenox, chlornitrof en (CNP) , 
difenoxuron, ethoxyfen, fluorodifen, fluoroglycof en-ethyl, 
fomesafen, furyloxyfen, lactofen, nitrofen, nitrof luorf en, 
5 oxyfluorfen 

b20 Dipyridylene: 

cyperquat, dif enzoquat-methylsulf at , diquat, paraquat di- 

chlorid 

10 

b21 Hamstoffe: 

benzthiazuron, buturon, chlorbromuron, chloroxuron, chlorto- 
luron, cumyluron, dibenzyluron, cycluron, dimefuron, diuron, 
dytnron, ethidimuron, fenuron, f luormeturon, isoproturon, 
15 isouron, karbutilat, linuron, methabenzthiazuron, metobenzu- 

ron, metoxuron, monolinuron, rnnnuron, neburon, siduron, tebu- 
thiuron. trimeturon 

b22 Imidazole: 
20 isocarbamid 

b23 Imidazolinone: 

imazamethapyr, imazapyr, imazaccuin, imazethabenz-methyl (ima- 

zame) , imazethapyr 

25 

b24 Oxadiazole: 

methazole, oxadiargyl, oxadiazon 

b25 Oxirane: 
30 tridiphane 

b26 Phenole: 

bromoxynil, ioxynil 

35 b27 phenoxyphenoxyprcpionsaureester: 

clodinafop, cyhalof op-butyl, diclof op-methyl, fenoxaprop- 
ethyl, fenoxaprop-p-ethyl, f enthiapropethyl, fluaz if op-butyl, 
£luazifop-p-butyl, haloxyfop-ethoxyethyl, haloxyf op-methyl, 
haloxyfop-p-methyl, isoxapyrif op, propaquizaf op, quizalofop- 

40 ethyl,, quizalofop-p-ethyl, quizalof op-tefuryl 

b28 Phenylessigsauren: 
chlorfenac (fenac) 

45 b29 PhenylpropionsAuren: 
chlorophenprop-methyl 
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b30 protoporphyrinogen-IX-Oxydase-Heromer: 

benzofenap, cinidon-ethyl, f lumiclorac-pentyl, f lumioxazm, 
flumlpropyn, flupropacil. f luthiacet-methyl, pyrazoxyfen. 
sulfentrazone, thidiazimin 

5 

b31 Pyrazole: 
nipyraclofen 

b32 Pyridazine: 
10 chloridazon, maleic hydrazide, norflurazon, pyridate 

b33 Pyridincarbonsauren: 

clopyralid, dithiopyr, picloram. thiazopyr 

15 b34 Pyrimidylethern: 

pyrithiobac-saure. pyrithiobac-sodium, KIH-2023. KIH-6127 

b35 Sulfonamide: 

f lumetsulam, metosulam 

20 

b36 Sulfonylhamstoffe: 

amidosulfuron, azimsulfuron, bensulfuron-inethyl, chlorimuron- 
ethyl. chlorsulfuron, cinosulfuron. cyclosulf amuron, ethamet- 
sulfuron methyl, ethoxysulfuron. f lazasulfuron. halosulfuron- 
25 methyl, imazosulfuron. metsulfuron-methyl, nicosulfuron. pri- 

misulfuron, prosulfuron. pyrazosulf uron-ethyl , rimsulfuron. 
Bulforoeturon-methyl. thif ensulfuron-methyl, triasulfuron, 
tr ibenuron-raethyl , trif lusulf uron-methyl 

30 b37 Triazine: 

ametryn. atrazin, aziprotryn. cyanazine. cyprazlne, desme- 
tryn, dimethamethryn, dipropetryn. eglinazin-ethyl, hexazi- 
non, procyazine, prometon. prometryn, propazin, secbumeton. 
simazin, simetryn. terbumeton, terbutryn, terbutylazin. trie- 

35 tazln 

b38 Triazinone: 

ethiozin, metamitron, metribuzin 

40 b39 Triazolcarboxamide: 
triazofenamid 

b40 Oraclle: 

bromacil, lenacil, terbacil 

45 
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fenstrole, chlorthal-dimethyl (DCPA) . cinmethylin dichlobe 
nil, endothall. f luorbentranil, mcfluidide. perfluldone, pi- 
5 perophos 

Funlctionell iquivalente Derivate pflaazlicher E^^"^;;^;;^^^""" 
besitzen ein vergleichbares Wirkungsspelctrum wie k°"^ret ge- 
^a^ten Substanzen. bei niedrigerer, gleicher oder h6herer inh.- 
10 ""rtscher Alctivitat (z.B. ausgadruc.t in g Inhibitor P- He^^" 
A^i^aufiache. erforderlich zur vollstandigen Unterdruckung des 
wachstums nicht-resistenter Pf lanzen) . 

Die Erfindung wird durch die nun folgenden Beispiele erlautert. 
15 ist aber nicht auf diese beschrSnkt: 

Allgemeine Klonierungsverf ahren 

Die im Rahmen der vorliegenden Erfindung durchgefuhrten Klonie- 
20 rungsschritte wie z.B. Restriktionsspaltungen Agarose-Gel- 
eleJctrophorese, Reinigung von DNA-Fragmenten, Transfer von Nu 
kie^nsiuren auf Nitrozellulose und Nylonmentoranen. Verkn^pfen von 
^ "agn.enten. Transformation von E. coli Zellen. ^-^^ J- 
Bakterien. Vennehrung von Phagen und Sequenzanalyse rekoatoinanter 
25 D^^r^; wie bei San^rook et al. (1989) Cold Spring Harbor La- 
b^atory Press; ISBN 0-87969-309-6) beschrieben durchgef dhrt . 

Die im folgenden verwendeten Bakterienstamme (E. coli. XL-I Blue) 
^rdi; von'stratagene bezogen. Der zur 
30 verwendete Agrobakterienstamm 

mit dem Plasmid pGV2260 oder pGV3850kan) wurde von Deblaere et 
b^chrieben (Nucl. Acids Res. 13 (1985) 4777). Altematxv 
kLen auch d" Igrobakteriensta^. .BA4404 (Clontech) oder andere 
TeSgnete St^e eingesetzt ^-^^^^^ al^^^^ 
" r (ftLrel: (roLrat'^^l^O <Xnvitrogen). 

ri9"Ber e: l... ^c.. Acids -.12(1984) B-1-8720 - 
pBinAR (Hfifgen und Willmitzer. Plant Science 66 (1990) 221 230) 
benutzt. 

40 

Sequenzanalyse rekotnbinanter DNA 

Die sequenzierung rekotrd^inanter DNA-Molekvile erfolgte einem 
I^aserf^lloreszenz-DNA-Seciuenzierer der Firma ^---- -=^51 
45 Methode von Sanger (Sanger et al., Proc. Natl. Acad. Scl. USA 
74 (1977) , 5463-5467) . 
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Erzeugung pflanzlicher Express ionskasset ten 

in das Plasmid pBinl9 (Bevan et al., Nucl. Acids Res. 12, 8711 
(iset)) wurde ein 35S CaMV Promotor als EcoRl-Kpnl-Fragment 
entsprechend den Nukleotiden 6909-7437 des caulif lower-Mosaik- 
i!^s (Pranc. et al. Cell 21 (1980) 285) inseriert Das Polyade- 
ntuerungssignal des Gens 3 der T-DNA des Ti-Plasmxdes pTiACHS 
Tittrrt EMBO J. 3 (1984) 835) , Nucleotide 11749-11939 
^rde als PWI -Hindlll-Pragment isoliert und nach Addition von 
^hl LSe^an die PvuII-Schnittstelle zwischen die SpHI-Hindlll 
Tci^li^trerdes vectors .loniert. Ks entstand das Plas.id pBi- 
nAR (H6fgen und Willmitzer. Plant Science 66 (1990) 221 230) . 



Anwendungsbeispiele 

15 

Beispiel 1 



Da Herbizide nicht iitununogen sind, mussen sie an em Trager- 
^terial wie z.B. KLH gelcoppelt werden. Befindet sich eine reak- 

20 Uve G^PPe im Mole)ciil. Icann diese Kopplung direkt erfolgen an- 
soIsteHird wahrend der Synthese des Herbizides exne funktio- 
ne^le Gruppe eingefm.rt oder eine reaktive Vorstufe wihrend der 
^thesrausgesucht. urn diese Molek^le in einem einfachen Reakti- 
S^schrltt a^ das TrSgermolek^l zu koppeln. Beispiele fur Kop- 

2S ""gen nd bei Miroslavic Perencik in -Handbook of I«he- 
mist^- . 1993. Chapman . Hall, im Kapitel Antigene. Seite 20 
49 beschricben. 

Durch wiederholte Injektion dieses modif izierten TrSgermolekuls 
30 ^t^gl^s) werden z.B. Balb/c-Mause iimmnisiert. Sobald xm Serum 
i^^aiird Ltik6rper mit Bindung an das Antigen i"" ^LISA ^enz^^ 
lirJced imnnino sorbent assay) nachweisbar sind. werden ^xlz 
zinen dieser Tiere entnommen und mit Myelomzellen fus.oniert urn 
Hybride zu kultivieren. Im ELISA wird zusatzlich als Antigen 
35 -Herbizid-modifiziertes aSA- verwendet. um die ^^l^^ tZTl 
gerichtete Immunantwort von der KLH-Antwort zu unterscheiden. 

Die Herstellung von monoklonalen Antik6rpem erfolgt in Anlehnung 
be"a^te MeLden. wie z.B. beschrieben in - Practical l^o- 

40 logy-. Leslie Hudson und Frank Hay, Blackwell Scientific 
P^i;ations, 1989 Oder in -Konoclonal ^'^"'^^^^ = 
and practice- .James Coding, 1983, Academic Press Inc , oder in 
^ practical guide to monoclonal antibodies'. J.Liddell und A 
4er 199l! Wiley. Sons; oder Achim M611er und Pranz Emlxng 

45 -^noionai; Antik6rper gegen TNP und deren Verwendung-. Europax- 
sche Patentschrift EP-A260S10. 
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Beispiel 2 

Ausgangspunkt der Untersuchung war ein monoklonaler Antikdrper 
der spezifisch das Herbizid Quinmerac erkermt und der auBerdem 
5 eine hohe Bindungsaf f initat aufweist. Die selektionierte Hybrido- 
mazellinie ist dadurch charakterisiert. dafl die sekretierten. ge- 
gen das Herbizid-Antigen Quinmerac gerichteten monoklonalen Anti- 
kdrper eine hohe Affinitit aufweisen und die spezifischen Sequen- 
zen der Immunglobuline verfugbar sind (Berek, C. et al.. Nature 
10 316. 412-418 (1985)). Dieser monoklonale Antikdrper gegen Quinme- 
rac'war Ausgangspunkt fur die Konstruktion des Einketten-Antik6r- 
perf ragmentes (scFv-antiQuinmerac) . 

Zunachst wurde mRNA aus den Hybridomzellen isoliert und in cDNA 
15 umgeschrieben. Diese cDNA diente als Matrize fur die Amplif ika- 
tion der variablen Immunglobulingene VH und VK mit den spezifi- 
schen Primern VHl BACK und VH FOR-2 fur die schwere Kette sowie 
VK2 BACK und MJK5 FON X fur die leichte Kette (Clackson et al.. 
Nature 352. 624-628 (1991)). Die isolierten variablen Immunglobu- 
20 line waren Ausgangspunkt fur die Konstruktion eines Einketten-An- 
tikdrperfragmentes (scFv-antiQuinmerac) . Bei der nachf olgenden 
Fusions- PCR wurden drei Komponenten VH.VK und ein Linkerf ragment 
in einem PCR-Reaktionsansatz vereinigt und das scFv-antiQuinmerac 
amplif iziert (Abb. 3 ). 

25 

Die funktionelle Charakterisierung (Antigenbindungsaktivitat) 
des konstruierten scFv-antiQuinmerac-Gens erfolgte nach Expres- 
sion in einem bakteriellen System. Das scFv-antiQuinmerac wurde 
dazu nach der Methode von Hoogenboom, H.R. et al.. Nucleic Acids 
30 Research, 19, 4133-4137 (1991) als Idsliches Antikdrperf ragment 
in E.coli synthetisiert. Die Aktivitat und die Spezifitat des 
konstruierten Antikdrperf ragmentes wurden in ELISA-Tests uber- 
pruf t (Abb. 4 ) . 

35 Um eine samenspezif ische Expression des Antikdrperf ragmentes in 
Tabak zu ermdglichen, wurde das scFv-antiQuinmetac Gen stromab- 
warts vom LeB4-Promotor kloniert. Der aus Vicia faba isolierte 
LeB4-Promotor zeigt eine streng samenspezif ische Expression von 
verschiedenen Premdgenen in Tabak (Biumlein, H. et al.. Mol. Gen. 

40 Genet. 225, 121-128 (1991)). Durch Transport des scFv-antiQuinme- 
rac Polypeptides in das endoplasmatische Retikulum wurde eine 
stabile Akkumulation hoher Antik6rperf ragmentmengen erreicht. Das 
scFv-antiQuinmerac Gen wurde zu diesem Zweck mit einer Signalpep- 
tidsequenz, die den Eintritt in das endoplasmatische Retikulum 

45 und dem ER-Retentionssignal SEKDEL. das ein Verbleiben im ER ge- 
wShrleistet (Wandelt et al.,1992). fusioniert (Abb. 5 ). 
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ni« konstruierte Expressionakassette wurde in den bindren vektor 
nGSGS?1^s"to et al., 1990) kloniert und durch EleKtroporatxon 
if Sn AUraLeriun^Stanun EHA 101 transferiert. Rekonibinante 

5 wlcotiana tahacun. verwendet. Pro Konstrulct wurden 70 140 TabaX 
pfl^zen regeneriert. Von den regenerierten transgenen Tabak- 
pf lanzen rege geibstbef ruchtung Samen verschiedenener Ent- 

oflanzen wurden nacn seiDscoeituuui-ku ia=h 
wic^lungsstadien geemtet. Von diesen Samen wurden die I6slxchen 
^r^i'I nach Extraction in einern wassrigen Puffersystem erhal-^ 
10 ten Die Analyse der transgenen Pflanzen zeigt. daB d-<=^ 

ITon des scPv-antiQuinmerac Gens mit der DNA-Sequenz des ER-Re- 
.IntionssigTairSEKDEL eine .aximale Akkumulation von 1,9 % scPv- 
^tl^inmerac Protein im reifen San>en erzielt werden konnte. 

15 Das konstruierte scFv-antiQuinmerac Gen hatte 

735 bp. Die variablen DomSnen wurd«. in der Reihenfolge VH L VL 
miteinander fusioniert. 

Die spezifische Selektivitat wurde in den Extrakten der reifen 
20 Tabaksamen ndt einem direkten ELISA bestimmt. Die dabei 

eSaltenen Werte zeigen deutlich, daB die Proteinextrakte funk- 
tionell aktive Antikdrperf ragmente enthalten. 
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Beispiel 3 

Samenspezif ische Expression und Anreicherung von Einkett^-Anti- 
k6rperfragmenten im endoplasmatischen Retikulum von Zellen 
tr^sgener Tabaksamen kontolliert durch den USP- Promoter. 



30 Ausgangspunkt der Untersuchungen war ein Einzelketten-Antik6rper- 
fragZt gegen das Herbizid Quinmerac '-^"-^^^^^^i;,";: 
f Jtionelle Charakterisierung ^^'^^'^'^^^'^f f !f ^^^'^^^^^^^ 
konstruierten scFv-anti- Quinmerac Genes 

in einem bakteriellen System und nach Expression ^^^^^^^J 
35 tern. Die Aktivitat und die Spezifitit des konstruierten Antikftr 
perfragmentes wurde in ELISA-Tests uberpruf t. 

Um eine samenspezif ische Expression des ^'^f 

Tabak zu erm6glichen. wurde das sc^-antiQuinmerac stro„^ 

40 warts vom USP-Promotor kloniert. Der aus Vlcia faba isolierte 
;SP-Promotor zeigt eine streng samenspez if ische Hxpr-sion^on 
verschiedenene Fremdgenen in Tabak (Fiedler. 0. et ^^"J]^\ 
Mol Biol. 22, 669-679 (1993)). Durch Transport des scFv-anti 
^inmerac Poi;peptides in das endoplasmatische Retikulum wurde 

45 ^ne stabile Akkumulation hoher Antik6rperfragmentmengen er- 
reicht. Das scFv-antiQuinmerac Gen wurden zu diesem Zweck mit 
einer Signalpeptidsequenz. die den Eintritt in das endoplasmati- 
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sche Retilculurn and dem ER-Retentionssignal SEKDEL ^^^ l^ ^^l' 
blelben im EK gewihrleistet (Wandelt et al., 1992). fusxoniert 

(Abb. 1 ) . 

5 Die konstruierte Expressionskassette wurde in den binaren Vektor 
nGSGSci^aUo et al., 1990) kloniert und durch Elektroporatxon 
in len l^r^birteriu^Stan. EHA 101 transf eriert. Kekon^inante 
MrobakS^lone wurden fur die nachfolgende Transformation von 
7icotTL tabacu™ verwendet. Von den regenerierten transgenen Ta- 

10 bakpflanzen wurden nach Selbstbefruchtung Samen versc,.edener^ ^ 
Entwicklungsstadien geerntet. Von diesen Samen wurden die losli 
Sen P^otelne nach Extraktion in einem wtssrigen f^^'^-^"- 
halted Die Analyse der transgenen Pflanzen zeigt da5 durch dxe 
Fusion des scFv-antiAcifluorfen Gens mit der DNA-Sequenz des ER 

15 ^tlnUonssignals SEKDEL unter Kontrolle des usP-Promotors be- 
re"s L Tag 10 der Samenentwicklung Einketten-Antxk6rperf rag- 
Znte mit Bindeaf f initat fur Quinmerac synthetisiert vmrden. 

Beispiel 4 

'° Om eine ubiquitdre Expression des Antikorperf ragmentes i'J ^er 
Pfltnze speziell in BUttem. zu erreichen. wurde das scFv-anti- 
SiZrlc Gen stron.abw.rts vom CaMV 35 S-Promotor kloniert. Die- 
TeTTtllle institutive Promoter vermittelt eine Expression von 
25 Premd e^e: in nahezu alien pf lanzlichen Geweben (Benf ey^und Chua. 
c * L 9<;n nqqoi 956 - 966). Durch Transport des scFv-anti 
ZlZirJoZl'in L endoplasmatische -^i^^™ -^-^^^J^. 
labile Akkumulation hoher Antikorperfragmentmengen im Blattmate 
rial erreicht. Das scFv-antiQuinmerac Gen wurde zunSchst mit 
rial erreicnc. uaa pintritt in das endoplasmati- 

30 einer Signalpeptidsequenz, die den ^mtritt in ^^^blei- 
sche Retikulum und dem ER-Retentionssignal KDEL «^^^ 
ben im ER gewahrleistet (Wandelt et al.. Plant J. 2(1992). 181 
192) fusioniert. Die konstruierte Expressionskassette wurde in 
den binaren Vektor pGSGLUC 1 (Saito et al.. Plant Cell Rep^ 
35 8(1990) .718 - 721) kloniert und durch illektroporation in den 
" obak;erium -Stao. EHA 101 transf eriert. 

tlrienklone wurden filr die nachfolgende ^^^^^^^^^^^^^^..^.en 
Nicotiana tahacum verwendet. Es wurden ^f^^^^'^.^e 
regeneriert. Von den regenerierten ^n 
40 Blattmaterial verschiedenener Entwicklungsstuf en 

diesem Blattmaterial wurden die '^^^'^"^^ ^1^'^'^^,^^ 
tion in einem wSssrigen Puffersystem erhalten. 

lysen (Westem-Blot-Analysen und ELISA-Tests) "^9""' f ^ 
B^tteL eine maximale Akkumulation von gr5Ber 2 * - ^/f °J^^^f 
45 aktivem. antigenbindendem scFv-antiQuinraerac ^^^^^^^^^^^^^^^^'^^ 
werden konnte. Die hohen Expressionswerte wurden m ausgewachse 
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aber auch in seneszentem Blattma- 
terial konnte das Antikdrperf ragmen t nachgewiesen 

Belsplel 5 

5 , „ rDNA codierend fur den 

synthetischer oligonukleotide. 

. ^ r.,-r, VA^^en-Antik6rper cDNA wurde in einem 

200 MM Nukleotide ^"^^^^^/^/^ ' PoiUrase (Perkin El- 
15 bei 250C, l,5mM MgC12) und 0 02 O/j^^ J^J^ ^ eingestellt: 
mer) . Die Amplif ikationsbedingungen vmrden wie to g 

Anlagerungstemperatur: 45«C 
Denaturierungsteroperatur: 94 C. 

*7 2®C 

20 Elongationstemperatur: • 
Anzahl der Zyklen: 

.3 resu.ier.e ein -^^r V.l^^^^^^^^ 

Anwendung und Optimierung der ^f/"^/^^^,^^'',^\ethods and Ap- 
innis et al.. 1990. PCR Protocols, a Guide 
plications, Academic Press. 

30 Beispiel 6 

, fii,^,fln die eine cDNA codierend fur 

exprimieren. 

35 



einer Obemachtkultur einer positiv craiiB 
4. rien,colo„U In ■^"!l1rn« ;/r«;s=hel^ 

steriler Pflanzen (zu je ca. i cm j wu Minuten 
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Claforan {Cef otaxime-Natrium) , 50 mg/1 Kanamycin, 1 mg/1 Benzyl- 
arainopurin (BAP), 0,2 mg/1 Naphtylessigsaure und 1,6 g/1 Glukose 
weitergefuhrt , Wachsende Sprosse wurden auf MS-Mediuiti mit 2% 
Saccharose, 250 mg/1 Claforan und 0,8% Bacto-Agar uberfuhrt. 

5 

Beispiel 7 

Stabile Akkumulation des Einketten-Antikorperf ragmentes gegen das 
Herbizid Quinmerac im endoplasmatischen Reticulum. 

10 

Ausgangspunkt der Untersuchungen war ein in Tabakpf lanzen expri- 
miertes Einketten-Antikorperf ragment gegen das Herbizid Quinme- 
rac (scFv-anti Quinmerac). Menge und Aktivitat des synthetisierten 
scFV-antiQuinmerac Polypeptides wurden in Western-Blot-Analysen 
15 und ELISA-Tests bestimmt. 

Um eine Expression des scFv-antiQuinmerac-Gens im endoplasmati- 
schen Retikulum 2u ermoglichen, wurde das Fremdgen unter der 
Kontrolle des CaMV 53S-Promotors als eine Translationsf usion mit 
20 dem LeB4-Signalpeptid (N-terminal) und dem ER-Re tent ions signal 
KDEL (C-terminal) exprimiert. Durch Transport des scFv-antiQuin- 
merac Polypeptids in das endoplasmatische Retikulum wurde eine 
stabile Akkumulation hoher Mengen an aktivem Antikorperf ragment 
erreicht. Nach Ernte des Blattmaterials wurden Stucke bei -20oc 

25 eingefroren (1), lyophilisiert (2) oder bei Raumtemperatur ge- 
trocknet (3) . Die los lichen Proteine wurden aus dem jewel ligen 
Blattmaterial durch Extraktion in einem wassrigen Puffer erhalten 
und das scFv-antiQuinmerac Polpypeptid af f initatschromato- 
graphisch gereinigt. Gleiche Mengen an gereinigtem scFv-antiQuin- 

30 merac Polypeptids (eingefroren, lyophilisiert und getrocknet) 
wurden fur die Bestiramung der Aktivitat des Antikorperf ragmentes 
eingesetzt (Abb. 6). In Abb. 6 A ist die Antigenbindungsaktivitat 
des aus frischen (1) , lyophilisierten (2) und getrockneten 
Blattem (3) gereinigten scFv-antiQuinmerac Polypeptides darge- 

35 stellt. In Abb. 6 B sind die jeweiligen Mengen an scFv-antiQuin- 
merac Protein (etwa 100 ng) , die fur die ELISA-Analysen einge- 
setzt wurden, mittels Western-Blot-Analysen bestimmt. Die Groflen 
der Proteinmolekulargewichtsstandards sind links dargestellt. Da- 
bei wurden etwa gleiche Antigenbindungsaktivitaten f estgestellt . 

40 

Beispiel 8 

Zum Nachweis der Herbizid-Toleranz der ein Polypeptid mit Herbi- 
zid-bindenden Eigenschaf ten produzierenden transgenen Tabakpflan- 
45 zen wurden diese mit unterschiedlichen Mengen Acifluorf en bzw. 
Quinmerac behandelt. In alien Fallen konnte im Gewachshaus ge- 
zeigt werden, daB die ein scFv-antiAcif luorf en bzw. ein scFv-an- 
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tiQuinmerac expriitiierenden Pf lanzen im Vergleich zur Kontrolle 
eine Toleranz gegenuber den entsprechenden Herbiziden zeigen. 



5 



10 



15 



20 



25 



30 



35 



40 



45 



BNSOOCIO: <W0 98A28S2A1J_> 



wo 98/42852 



PCT/EP98/01731 



26 



Patentanspruche 



1. Verfahren zur Herstellung von Herbizid-toleranten Pflanzen 
5 durch Expression eines exogenen Herbizid-bindenden Poly- 

peptids in den Pflanzen. 

2. Verfahren nach Anspruch 1, dadurch gekennzeichnet , dafl es 
sich bei dem exogenen Herbizid-bindenden Polypeptid xim ein 

10 Einketten-Antikorperf ragment handelt. 

3. Verfahren nach Anspruch 1, dadurch gekennzeichnet, daB es 
sich bei dem exogenen Herbizid-bindenden Polypeptid um einen 
kompletten Antikorper oder um ein davon abgeleitetes Fragment 

15 handelt. 

4. Verfahren nach Anspruch 1, dadurch gekennzeichnet , daB es 
sich bei dem Herbizid um 5- (2-Chlor-4- (trif luormethyDphen- 
oxy) -2-nitrobenzoesaure handelt. 

20 

5. Verfahren nach Anspruch 1, dadurch gekennzeichnet, daB es 
sich bei dem Herbizid um 7-Chlor-3-methylchinolin-8-carbon- 
saure handelt. 

25 6. Verfahren nach einem der Anspruche 1-3, dadurch gekenn- 
zeichnet, dafl es sich um mono- oder dikotyle Pflanzen han- 
delt. 

7. Verfahren nach Anspruch 6, dadurch gekennzeichnet, dafi es 
30 sich bei der Pflanze um Tabak handelt. 

8. Verfahren nach einem der Anspruche 1-7, dadurch gekenn- 
zeichnet, dafl die Expression des exogenen Polypeptids konsti- 
tutiv in der Pflanze erfolgt. 

35 

9. Verfahren nach einem der Anspruche 1-7, dadurch gekenn- 
zeichnet, daB die Expression des exogenen Polypeptids in der 
Pflanze induziert wird. 

40 10. Verfahren nach einem der Anspruche 1-7, dadurch gekenn- 
zeichnet, dafl die Expression des exogenen Polypeptids in den 
Biattern der Pflanze erfolgt. 

11. Verfahren nach einem der Anspruche 1-7, dadurch gekenn- 
45 zeichnet, dafl die Expression des exogenen Polypeptids in den 

Samen der Pflanze erfolgt. 

Zeichn , 
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12. Expressionskassette fur Pflanzen bestehend aus einem 
Promoter, einem Signalpeptid, einem Gen codierend fur die Ex- 
pression eines exogenen Herbizid-bindenden Polypeptids, einem 
ER-Retentionssignal und einem Terminator, 

5 

13. Expressionskassette nach Anspruch 12, dadurch gekennzeichnet, 
daB als konstitutiver Promoter der CaMV 35S-Promotor 
verwendet wird. 

10 14, Expressionskassette nach Anspruch 12, dadurch gekennzeichnet , 
dafi als zu exprimierendes Gen das Gen eines Einketten-Anti- 
korperf ragmentes eingesetzt wird. 

15. Expressionskassette nach Anspruch 12, dadurch gekennzeichnet , 
15 dafi als zu exprimierendes Gen das Gen oder Genf ragmen t eines 

Herbizid-bindenden Polypeptides als Translationsf usion mit 
anderen f unktionellen Proteinen wie zum Beispiel Enzymen, To- 
xinen, Chromophoren und Bindeproteinen eingesetzt wird, 

20 16. Expressionskassette nach Anspruch 12, dadurch gekennzeichnet , 
daB das zu exprimierende Polypeptidgen aus einer Hybridoma- 
zelle Oder mit Hilf e anderer rekorobinanter Methoden - wie 
z.B. der Antikorper-Phage-Display Methode - gewonnen wird. 

25 17. Verwendung der Expressionskassette nach Anspruch 12 zur 

Trauis formation von dicotylen oder monokotylen Pflanzen, die 
konstitutiv samen- oder blatt-spezif isch ein exogenes Herbi- 
zid-bindendes Polypeptid exprimieren. 

18. Verwendung nach Anspruch 17, dadurch gekennzeichnet , dafi man 
die Expressionskassette in einen Bakterienstamm trsuasf eriert 
und die entstandenen rekombinanten Klone zur Transformation 
von dicotylen oder monokotylen Pflanzen, die konstitutiv sa- 
men- Oder blattspezif isch ein exugenes Herbizid-bindendes 
Polypeptid exprimieren, verwendet. 

19. Verwendung der Expressionskassette nach Anspruch 12 als Se- 
lektionsmarker . 

40 20. Verwendung einer trans formier ten Pflanze wie nach Anspruch 18 
Oder 19 erhalten zur Herstellung eines Herbizid-bindenden 
Polypeptids. 
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21 • Verfahren zur Transformation einer Pflanze durch Einbringen 
einer Gensequenz, die fur ein Herbizid-bindendes Polypeptid 
codiert, in eine Pf lanzenzelle, in Kallusgewebe, eine ganze 
Pflanze \ind Protoplasten von Pf lanzenzellen. 

5 

22. Verfahren nach Anspruch 21, dadurch gekennzeichnet , daB die 
Trauisf ormation mit Hilfe eines Agrobacteriums insbesondere 
der Art Agrobacterium tumefaciens erfolgt. 

10 23, Verfahren nach Anspruch 21, dadurch gekennzeichnet , daB die 
Transformation mit Hilfe der Elektroporation erfolgt. 

24. Verfahren nach Anspruch 21, dadurch gekennzeichnet, dafi die 
Transformation mit Hilfe der particle bombardment Methode er- 

15 folgt- 

25. Herstellung eines Herbizid-bindenden Polypeptides durch Ex- 
pression eines Gens codierend fur ein derartiges Polypeptid 
in einer Pflanze bzw. Zellen einer Pflanze und anschliefiende 

20 Isolierung des Polypeptides. 

26. Pflanze enthaltend eine Expressionskassette gemafl Anspruch 
12, dadurch gekennzeichnet, daB die Expressionskassette Tole- 
ramz gegenuber einem Herbizid vermittelt. 

25 

27. Pfleuxze nach Anspruch 26, dadurch gekennzeichnet, daB sie 
tolerant gegenuber 5- (2-Chlor-4- ( trif luormethyl) phenoxy) 
-2-nitrobenzoesaure ist. 

30 28. Pflanze nach Anspruch 26, dadurch gekennzeichnet, daB sie to- 
lerant gegenuber 7-Chlor-3-methylchinolin-8-carbonsaure ist. 

29. Verfahren zur Bekampfung von unerwunschtem Pf lanzenwuchs in 
transgenen Herbizid-resistenten Kulturpf lanzen dadurch ge- 

35 kennzeichnet, daiL Herbizide eingesetzt werden, gegen die die 

Kulturpf lanze Herbizid-bindende Polypeptide oder Antikorper 
bildet. 

30. Herbizid-bindende Polypeptide bzw. Antikorper mit hoher Bin- 
40 deaffinitat zu 5- (2-Chlor-4- (trif luormethyDphenoxy) -2-nitro- 

benzoesi.ure, dadurch gekennzeichnet, daB sie gemaB Anspruch 
25 hergestellt werden. 



45 
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31. Herbizid-bindende Polypeptide bzw. Antikorper mit hoher Bin- 
deaffinitat zu 7-Chlor-3-methylchinolin-8-carbonsaure, da- 
durch gekermzeichnet, dafl sie gemafl Anspruch 25 hergestellt 
warden . 

5 
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Fig. 1 



Prom. SP scFv TagTerm. 

SEKDEt: 



□ USP-Promoter ■ CaMV 35 Terminator 
a LeB4-Signalpeptid ■ single chain Fv 
g c-myc-Tag 
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Fig. 2 





Prom. 


SP scFv 


TagTerm. 
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□ CaMV 353 Promoter 11 CaMV 35 Terminator 
g LeB4-Signalpeptid B single chain Fv 
g c-myc-Tag 
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Fig. 3 



Fusions-PCR 
VH1 BackSfll JK5 NOT 10 



VH Linker VK 
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Fig. 4 




Ig scFv VerdQnnung 
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Fig. 5 



Konstrukt 




Prom. SP scFv TagTerm. 






J l=tess««*g«mill 3' 
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SekdIl 



□ LeB4 Promoter HI CaMV 35 Terminator 
g LeB4-Signalpeptid H single chain Fv-ox 
g c-myc-Tag 
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